No, but you can extract Actual Text with its coordinates and figure out
what to discard.

 

Note that there are two things that look like text but are not
characters-in-the-content-stream:

1) Images with text.  You need OCR.

2) Paths (lines) in the shape of characters.  You need OCR.

 

In general, you need OCR.  Your specific case may be able to use the
PdfContentStreamProcessor et al.  Check out the source for
SimpleTextExtractingPdfContentStreamProcessor, and get coordinates as
well.

 

--Mark Storer

  Senior Software Engineer

  Cardiff.com

 

import legalese.Disclaimer;

Disclaimer<Cardiff> DisCard = null;

 

________________________________

From: Zain ul Abideen [mailto:[email protected]] 
Sent: Wednesday, June 30, 2010 11:11 AM
To: [email protected]
Subject: [iText-questions] Data mining with iTextSharp

 

Hello all,
Is it possible to extract text from specific region from pdf. What I
mean is can we define co-ordinates of a rectangle or through some other
way and than extract text from that specific region ?

Regards,
Zain

________________________________

Hotmail: Trusted email with powerful SPAM protection. Sign up now.
<https://signup.live.com/signup.aspx?id=60969> 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.830 / Virus Database: 271.1.1/2968 - Release Date: 06/30/10
05:24:00


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to