No, but you can extract Actual Text with its coordinates and figure out
what to discard.
Note that there are two things that look like text but are not
characters-in-the-content-stream:
1) Images with text. You need OCR.
2) Paths (lines) in the shape of characters. You need OCR.
In general, you need OCR. Your specific case may be able to use the
PdfContentStreamProcessor et al. Check out the source for
SimpleTextExtractingPdfContentStreamProcessor, and get coordinates as
well.
--Mark Storer
Senior Software Engineer
Cardiff.com
import legalese.Disclaimer;
Disclaimer<Cardiff> DisCard = null;
________________________________
From: Zain ul Abideen [mailto:[email protected]]
Sent: Wednesday, June 30, 2010 11:11 AM
To: [email protected]
Subject: [iText-questions] Data mining with iTextSharp
Hello all,
Is it possible to extract text from specific region from pdf. What I
mean is can we define co-ordinates of a rectangle or through some other
way and than extract text from that specific region ?
Regards,
Zain
________________________________
Hotmail: Trusted email with powerful SPAM protection. Sign up now.
<https://signup.live.com/signup.aspx?id=60969>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.830 / Virus Database: 271.1.1/2968 - Release Date: 06/30/10
05:24:00
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/