I basically run our pdfs through a pdf 2 txt converter and extract the
data from the text files. It is pretty simple.
On 3/3/2011 6:21 AM, Mike Blezien wrote:
----- Original Message ----- From: "shawn wilson" <ag4ve...@gmail.com>
Cc: "Perl List" <beginners@perl.org>
Sent: Thursday, March 03, 2011 6:04 AM
Subject: Re: Extracting Data from PDF files
On Mar 3, 2011 6:35 AM, "Mike Blezien" <mick...@frontiernet.net> wrote:
----- Original Message ----- From: "shawn wilson" <ag4ve...@gmail.com>
Cc: "Perl List" <beginners@perl.org>
Sent: Thursday, March 03, 2011 5:22 AM
Subject: Re: Extracting Data from PDF files
On Mar 3, 2011 6:07 AM, "Mike Blezien" <mick...@frontiernet.net>
wrote:
Hello,
I posted a question earlier about creating a PDF file from a PDF form
submission which we now have working. We are able to create the PDF
file
to
be attached to an email.
The issue I'm having now is the ability to extract some specific data
from
these PDF file created. We need to extract a couple of form field
values
from the PDF file created. I've been reviewing the various PDF modules
and
haven't been able to figure it out. The modules I've looking at are
PDF::API2::Simple and PDF::FDF::Simple. These seem to just create/edit
the
PDF files, but I need to extract specific data from the created PDF
file.
Is there another way to do this with these modules or some other
method
?
Maybe I'm missing something but why don't you just dump all of the
form
data
into a db and then you can create as many pdf as you like? I mean,
I've
used
a pdf scraping module (you can even do ocr with one) but it isn't fun
because the data is generally not nicely formatted for this. This
probably
isn't the case for you but who cares because you have access to the
pre-processed data.
Shawn,
you mean dump it into a database(db) ? the data is mostly all binary so
not sure how you'd "scrape" it to extract the data but I'm not real
fimilar
with this approach :)
You said your data was coming from pdf form, right? I've never done
this per
se, however IIRC, the data is posted to a db, web cgi, or a text
file. If
this is the case, why not get the text from the db - its plain text
at that
point, no?
I wish it was that simple. All the data passed is basically all binary
from the PDF form, haven't be able to figure out how to extract the
actual specific form field data in the file.
Mike
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3479 - Release Date: 03/03/11
01:34:00