Hi all,
I searched the forum but couldn't find a similar problem nor the right
technical terms to describe it.
I am using iText to extract text from pdf files.
Everything was working fine until I started processing pdf files generated
from an HP ps driver. Note that I cannot change the driver and have no
choice but processing the generated files. Also, note that my problem
remains even when I tested with PDFCreator.
The PDFs open/read fine in any pdf-reader. But when parsing using iText,
only strange characters appear. Same behavior/result when copying the text
to the clipboard and pasting to notepad.
First, I thought it was a problem with the font type/encoding etc. But after
analyzing the output, I noticed the following pattern: Apparently the HP ps
driver starts by assigning the 33rd ASCII char (!) to the first char, 34(")
to the 2nd, 35 (#) to the 3rd etc. Pease see the ASCII table* below.
Example: if the first string in the pdf is 'whatever' it will correspond to
!"#$%"%&
I cannot figure out the reason for this behavior. Is it familiar to anyone
?
Since the pdf content displays normally in all the pdf-readers I've tested,
I believe there must be some sort of mapping table embedded in the pdf that
is used by the pdf-readers to map the characters. Anyone knows if iText can
load this table or natively process the matching/transformation process ?
Any help is very much appreciated.
Best regards,
-EL
----
*http://www.table-ascii.com/
--
View this message in context:
http://itext-general.2136553.n4.nabble.com/Strange-ASCII-character-replacement-pattern-Character-Mapping-Table-tp2526561p2526561.html
Sent from the iText - General mailing list archive at Nabble.com.
------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:
Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/