[iText-questions] [SPAM] Re: Differences btw text extraction from iText and Acrobat Reader?

mkl Wed, 20 Mar 2013 05:18:04 -0700

wwkloo,

wwkloo wrote
> When create the PDF with another program, the text can be extracted by
> iText and Acrobat Reader XI correctly.
> - 1: 0xD841 0xDD47
> - 2: 0x92DB
> 
> However, the character is not displayed correctly. :(
> iTextExtract_O.pdf
> <http://itext-general.2136553.n4.nabble.com/file/n4657858/iTextExtract_O.pdf> 
>


This other program seems to only know Unicode 1.x and, therefore, only
codepoints below 0x10000. Thus, it understands its input 0xD841 0xDD47 as
two different characters and not as 0x20547.

wwkloo wrote
> Please help!

I'm sure iText developers responsible for porting the Java version to .Net
will look into the handling of unicode characters beyond the basic
multilingual plane sometime soon.

Regards,   Michael



--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Differences-btw-text-extraction-from-iText-and-Acrobat-Reader-tp4657836p4657866.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

[iText-questions] [SPAM] Re: Differences btw text extraction from iText and Acrobat Reader?

Reply via email to