I basically run our pdfs through a pdf 2 txt converter and extract the data from the text files. It is pretty simple.

On 3/3/2011 6:21 AM, Mike Blezien wrote:
----- Original Message ----- From: "shawn wilson" <ag4ve...@gmail.com>
Cc: "Perl List" <beginners@perl.org>
Sent: Thursday, March 03, 2011 6:04 AM
Subject: Re: Extracting Data from PDF files


On Mar 3, 2011 6:35 AM, "Mike Blezien" <mick...@frontiernet.net> wrote:

----- Original Message ----- From: "shawn wilson" <ag4ve...@gmail.com>
Cc: "Perl List" <beginners@perl.org>
Sent: Thursday, March 03, 2011 5:22 AM
Subject: Re: Extracting Data from PDF files



On Mar 3, 2011 6:07 AM, "Mike Blezien" <mick...@frontiernet.net> wrote:


Hello,

I posted a question earlier about creating a PDF file from a PDF form

submission which we now have working. We are able to create the PDF file
to
be attached to an email.


The issue I'm having now is the ability to extract some specific data
from

these PDF file created. We need to extract a couple of form field values
from the PDF file created. I've been reviewing the various PDF modules
and
haven't been able to figure it out. The modules I've looking at are
PDF::API2::Simple and PDF::FDF::Simple. These seem to just create/edit
the
PDF files, but I need to extract specific data from the created PDF file.


Is there another way to do this with these modules or some other method
?


Maybe I'm missing something but why don't you just dump all of the form
data
into a db and then you can create as many pdf as you like? I mean, I've
used
a pdf scraping module (you can even do ocr with one) but it isn't fun
because the data is generally not nicely formatted for this. This
probably
isn't the case for you but who cares because you have access to the
pre-processed data.


Shawn,

you mean dump it into a database(db) ? the data is mostly all binary so
not sure how you'd "scrape" it to extract the data but I'm not real fimilar
with this approach :)

You said your data was coming from pdf form, right? I've never done this per se, however IIRC, the data is posted to a db, web cgi, or a text file. If this is the case, why not get the text from the db - its plain text at that
point, no?

I wish it was that simple. All the data passed is basically all binary from the PDF form, haven't be able to figure out how to extract the actual specific form field data in the file.

Mike



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3479 - Release Date: 03/03/11 
01:34:00

Reply via email to