Hello,

I met with a PDF file which does not embed font subsets and consequently failed 
to extract text from it.

I checked out the corresponding font dictionary and found the following things 
(indirect references are changed to embedded PDF objects in the following 
pseudo code and dictionary items are shown as /Name=VALUE):
<<C2_0
/Type=/Font
/BaseFont=/MingLiU
/SubType=/CIDFontType2
   <<CIDSystemInfo
   /Ordering="CNS1"
   /Registry="Adobe"
   /Supplement=3
   >>
   <<FontDesciptor
   /FontName=/MingLiU
   /Lang=/zh-TW
   /FontFamily=/MingLiU
   % there's no FontFile2 or other embedded font resource in the FontDescriptor 
dictionary
>>

How to extract text from such kind of document?
There's an Asian pack in SourceForge which contains a CNS1 file, and maybe 
useful for such kind of font decoding jobs, but I don't know how to utilize 
that pack. Anyone has ever done something similar please give me a hint.
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to