Re: Extracting Data from PDF files

shawn wilson Thu, 03 Mar 2011 04:05:25 -0800
On Mar 3, 2011 6:35 AM, "Mike Blezien" <mick...@frontiernet.net> wrote:
>
> ----- Original Message ----- From: "shawn wilson" <ag4ve...@gmail.com>
> Cc: "Perl List" <beginners@perl.org>
> Sent: Thursday, March 03, 2011 5:22 AM
> Subject: Re: Extracting Data from PDF files
>
>
>
>> On Mar 3, 2011 6:07 AM, "Mike Blezien" <mick...@frontiernet.net> wrote:
>>>
>>>
>>> Hello,
>>>
>>> I posted a question earlier about creating a PDF file from a PDF form
>>
>> submission which we now have working. We are able to create the PDF file
to
>> be attached to an email.
>>>
>>>
>>> The issue I'm having now is the ability to extract some specific data
from
>>
>> these PDF file created. We need to extract a couple of form field values
>> from the PDF file created. I've been reviewing the various PDF modules
and
>> haven't been able to figure it out. The modules I've looking at are
>> PDF::API2::Simple and PDF::FDF::Simple. These seem to just create/edit
the
>> PDF files, but I need to extract specific data from the created PDF file.
>>>
>>>
>>> Is there another way to do this with these modules or some other method
?
>>>
>>>
>> Maybe I'm missing something but why don't you just dump all of the form
data
>> into a db and then you can create as many pdf as you like? I mean, I've
used
>> a pdf scraping module (you can even do ocr with one) but it isn't fun
>> because the data is generally not nicely formatted for this. This
probably
>> isn't the case for you but who cares because you have access to the
>> pre-processed data.
>
>
> Shawn,
>
> you mean dump it into a database(db) ? the data is mostly all binary so
not sure how you'd "scrape" it to extract the data but I'm not real fimilar
with this approach :)
>
You said your data was coming from pdf form, right? I've never done this per
se, however IIRC, the data is posted to a db, web cgi, or a text file. If
this is the case, why not get the text from the db - its plain text at that
point, no?
Re: Extracting Data from PDF files

Reply via email to