Ben Short wrote:
> subType is /Type3
> 
> Does this help identify the problem?

Yes, but it doesn't bring us closer to a solution.

Type 3 fonts are "user defined fonts".

See for instance:
http://itextpdf.com/examples/index.php?page=example&id=200
In that example, a 'delta' and 'sigma' shaped glyph was defined, 
corresponding with the characters 'D' and 'S'. However, the example 
would also have worked if we'd used any other character.

Another example: we could define a glyph that looks like the symbol for 
'The Artist Formerly Known As Prince' to correspond with the character 
'P'. That's what Type 3 fonts are about: they can be used when a user 
needs a glyph that isn't provided in any other font.
Therefore it's very hard to extract that content: how are you going to 
know that the glyph corresponding with 'P' needs to be 'translated' to 
'The Artist Formerly Known As Prince'? I don't think there's a UNICODE 
code point for that glyph.

I think you've hit a limitation regarding text extraction in general.
-- 
This answer is provided by 1T3XT BVBA
http://www.1t3xt.com/ - http://www.1t3xt.info

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to