Re: [iText-questions] Extracting text from PDF

Kevin Day Thu, 04 Apr 2013 22:00:10 -0700

Nope.  PDF isn't a structured format - extracting structured text is a very,
very difficult challenge.  If the files all follow a similar format, you may
be able to use that knowledge to derive an algorithm that can do it (see the
LocationAwareTextExtractionStrategy).  There have been other posts about
this - search the listserv archives and you'll probably find some other
responses I've made to similar questions (I do recall taking the time to
outline the strategy that such an algorithm might use).




--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Extracting-text-from-PDF-tp4657954p4657980.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] Extracting text from PDF

Reply via email to