No, it's not possible (short of OCR) as the software that produced this PDF 
didn't encode any useful text information - only displayable glyphs.

Leonard

From: Paul Durrant [mailto:paul.durr...@clarksons.com]
Sent: Thursday, September 02, 2010 12:48 PM
To: 'itext-questions@lists.sourceforge.net'
Subject: [iText-questions] Itextsharp extact text



I'm trying to use  
iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, 1);
on the attached PDF but I don't get the text back, if I take the byte array and 
look at the contents then
the text block is not not in ASCII form although all the co-ordinate structure 
is correct eg anything between the () is not in ASCII form, how is it possible 
to get the text from this pdf



thanks Paul




________________________________
This message is private and confidential. If you have received it in error, you 
are on notice of its status. Please notify us immediately by reply email and 
then delete this message from your system. Please do not copy it or use it for 
any purposes, or disclose its contents to any other person: to do so could be a 
breach of confidence.

Emails may be monitored.

Details of Clarkson group companies and their regulators (where applicable) can 
be found at this url: Disclosure<http://www.clarksons.com/disclosure>
________________________________

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to