Onlu CJK Unicode is supported in text extraction. There's no major dificulty in 
supporting other cmaps, it's just not done.

Paulo

________________________________
From: 1T3XT BVBA [mailto:[email protected]]
Sent: Monday, November 07, 2011 10:35 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] How to extract text of CNS1 ordering without 
embedded font resource

On 7/11/2011 0:43, WMJ wrote:
Hello,

I met with a PDF file which does not embed font subsets and consequently failed 
to extract text from it.

The fact that a font isn't embedded doesn't mean you can't extract text. Text 
extraction doesn't need to know what a glyph looks like, it only needs to know 
the correct unicode value of each character. I don't know if iText is already 
able to parse CJK form. Can you share a sample PDF that fails, so that we can 
take a look at it?
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to