There are two ways to handle Type 3 encodings. 1) It's a newer Type3 and has an associated ToUnicode table - that's easy ;).
2) Use the name of the glyph (the key in the CharProcs table) against the Adobe Glyph List (<http://en.wikipedia.org/wiki/Adobe_Glyph_List>) which maps standard names to Unicode values. Leonard -----Original Message----- From: Kevin Day [mailto:ke...@trumpetinc.com] Sent: Monday, June 21, 2010 5:52 PM To: itext-questions@lists.sourceforge.net Subject: Re: [iText-questions] NPE while Extracting text The trick here is obtaining a mapping between the type 3 font glyphs and some sort of encoded text. There are several ways that this can be done, and they are fairly well supported by the text parser - but type 3 fonts, as has been mentioned, don't *usually* have this sort of mapping information. I know a lot of the PDF specification, but I don't know all of it - and it's quite possible that there is some mechanism for obtaining this sort of mapping. I guess the first thing to do is to ask whether Acrobat can figure the text out for these fonts (can you hi-light the text, copy and paste it into a text editor?). If they can, then it's time to dig into the PDF spec and figure out if there is some mapping strategy that isn't being handled by CMapAwareDocumentFont. What it sounds like to me is that the string that is passed into decode() is actually correct. Interestingly, looking at the font definition that you provide, there is a dictionary entry for Encoding. I think that this is where careful reading of the PDF spec is going to be required - so here are some resources to get you started: Here's the spec: http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf Section 9.6.5 discusses type 3 font dictionaries. I note that Type 3 fonts *can* have a ToUnicode entry. And they have an Encoding entry. So these sure sound an aweful lot like Type 1 fonts as far as text extraction is concerned. From a debugging perspective, I think that the next step is to do a debug walk through with a document containing normal Type 1 font, and comparing that with the walkthough of your document with Type 3 font. You may find that there's something subtle that can be tweaked to make this work. Please let me know what you find! -- View this message in context: http://itext-general.2136553.n4.nabble.com/NPE-while-Extracting-text-tp2256512p2262853.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/ ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/