Not that I know enough about PDF to contribute anything significant but out of curiosity (which-lo and behold- killed the cat) I tried C&P with PDF-Xchange Viewer Into notepad and the text was extracted beautifully (while Acrobat X gives scrambled garbage). First few lines from Ambulo Report.pdf are below, so the info must be somewhere
Kind regards, /Gerold Patient Information Name Demo - Hypertensive Last Primary physician Patient ID ID1 Date of birth Height, Weight Tuesday, October 10, 1972 170 cm, 78 kg Interpreting physician Statistical Overview Start Time Tuesday, January 29, 2008, 17:40 Stop Time Wednesday, January 30, 2008, 17:10 Duration 23 Hours Measurements 37 Total: 37 Included, 0 Excluded, 0 Events, 0 Errors Complete (37 Included, 100%) Mean Difference between Awake and Asleep Min Mean Max StdDev ∆ mmHg % drop Systolic 118 164.4 195 28.0 Systolic 41.8 23 % Diastolic 81 107.3 125 12.8 Diastolic 22.2 19 % Pulse 72 82.3 92 5.3 Pulse 0.9 1 % MAP 95 124.2 146 15.1 MAP 25.4 19 % Systolic > 140 70.3 % Diastolic > -----Ursprüngliche Nachricht----- Von: mkl [mailto:m...@wir-sind-cool.org] Gesendet: Donnerstag, 31. Jänner 2013 13:41 An: itext-questions@lists.sourceforge.net Betreff: [iText-questions] [SPAM] Re: Not able to read text from ItextShap Kiran Ghadge, Kiran Ghadge wrote > I am using itextsharp for reading text from PDF file. > I have attached sample project. > Below is code snippet. But the I am not able to get text from page. The code snippet in your message contained collected no text and printed no text. Thus, I assume, that code did not produce the output. The code in the attached project, on the other hand, collects and outputs text from the accompanying PDF. It first does a funny conversion, though: (Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(text)))) Such a conversion should not be necessary if the text from the PDF can be properly read. Here is your actual problem, though: The PDF does not seem to contain the correct information for text extraction at all, just try to do it using Adobe Acrobat (which is quite good at text extraction), for me it returns assorted symbols only. Therefore, I'm afraid for PDFs like the one given you either have to resort to a custom extraction routine with a very special byte to text conversion, or you have to use OCR. Regards, Michael -- View this message in context: http://itext-general.2136553.n4.nabble.com/Not-able-to-read-text-from-ItextShap-tp4657491p4657496.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php