No, it's not possible (short of OCR) as the software that produced this PDF
didn't encode any useful text information - only displayable glyphs.
Leonard
From: Paul Durrant [mailto:paul.durr...@clarksons.com]
Sent: Thursday, September 02, 2010 12:48 PM
To: 'itext-questions@lists.sourceforge.net'
Subject: [iText-questions] Itextsharp extact text
I'm trying to use
iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, 1);
on the attached PDF but I don't get the text back, if I take the byte array and
look at the contents then
the text block is not not in ASCII form although all the co-ordinate structure
is correct eg anything between the () is not in ASCII form, how is it possible
to get the text from this pdf
thanks Paul
________________________________
This message is private and confidential. If you have received it in error, you
are on notice of its status. Please notify us immediately by reply email and
then delete this message from your system. Please do not copy it or use it for
any purposes, or disclose its contents to any other person: to do so could be a
breach of confidence.
Emails may be monitored.
Details of Clarkson group companies and their regulators (where applicable) can
be found at this url: Disclosure<http://www.clarksons.com/disclosure>
________________________________
------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:
Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/