[iText-questions] Reading and Extracting Text from PDF

Richard Braman Tue, 14 Feb 2006 10:48:10 -0800

I have a open source project that is attempting to structure IRS
produced documents such as publications and instructions and parse out
data that is critical to building tax software.
An example of such a file is http://www.irs.gov/pub/irs-pdf/p1346.pdf.
This file contains e-file record layouts, which start on page 398.  They
used to publish this as text which made parsing relatively easy, but now
it comes in PDF only, and the project needs to be able to have good open
source parsing technology.   Is Itext the right tool for this job?  I
have seen it do good work on parsing the metadata contained in IRS
fill-in forms.
 
 
Richard Braman
mailto:[EMAIL PROTECTED]
561.748.4002 (voice)


http://www.taxcodesoftware.org
Free Open Source Tax Software



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

[iText-questions] Reading and Extracting Text from PDF

Reply via email to