Without seeing the actual PDF, but based on that differences array - the answer is NO - there is no (easy) way to extract the text. Your only option might be rasterization + OCR.
-----Original Message----- From: Pakhu [mailto:[email protected]] Sent: Wednesday, March 09, 2011 9:38 PM To: [email protected] Subject: [iText-questions] Unreadable Pdf with PdfTextExtractor I have received a set of pdf files that cannot be parsed using itext pdftextextractor. All characters are meaningless. I attach a sample if you want to verify it. If I copy part of the file and paste it on a text editor I also get that messy meaningless result. All fonts are TrueType embedded subsets. The differences array look like this: <</Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[ 1/g48/g55/g54/g3/g44/g81/g70/g82/g80/g76/g74/g86/g17/g79/g29/g36 /g87/g72/g47/g53/g73/g85/g49/g18/g51/g68/g91/g89/g88/g39/g41/g16 /g56/g38/g75]>> Is there any way I could render this file? any transformation to the document that could help? I'm interested in just the text not in its format. Thanks -- View this message in context: http://itext-general.2136553.n4.nabble.com/Unreadable-Pdf-with-PdfTextExtractor-tp3345219p3345219.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
