Yep-it is absolutely possible to pull the text out of at least some PDFs (as 
you mentioned, some PDFs may contain text that may not be able to be detected 
as text but instead may appear as image content) and as I mentioned yesterday, 
Raymond Camden's PDFUtils package has functionality for doing this very thing 
using ColdFusion 8.  However, the data that you get out is unstructured.  
Mischa:  if you have an example of a PDF document from which you are able to 
pull structured data, please share those details.

Josh

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mischa 
Uppelschoten ext 10
Sent: Wednesday, August 06, 2008 3:54 PM
To: Web Site
Subject: re[2]: [ACFUG Discuss] CF8 & PDFs

It is not necessary to use a true PDF form (with identified fields and values) 
to be able to extract text from it. I just printed a PO from our system using 
an open source PDF printer and then converted it back into Excel. There are 
some applications that use a bitmap format like tiff or jpg as an intermediary 
to produce a PDF. Documents from these systems could obviously not easily be 
converted back into text.
/m


: I think (Josh please correct me) that the availability of the data in the
:  final PDF file has more to do with how the PDF was created. If you just use a
:  PDF printer to create the file your data isnt available, but my understanding
:  was that if you used a "PDF form" on in the input side then the data is still
:  in those form fields in the binary file and thus extractable. Otherwise I
:  dont see what the excitement would be about with PDF workflows in Livecycle.
:

: On Wed, Aug 6, 2008 at 3:14 PM, Jeff Howard <[EMAIL PROTECTED]><mailto:[EMAIL 
PROTECTED]> wrote:
:

: Ok, I took your advice from your first post and converted it to text and
:  dumped it just to see what I would be dealing with.  I understand what you
:  are saying about pulling structured data from an unstructured source.  I
:  guess my thought was if the the document was created from a form, that there
:  may be a way to pull the form fields out once the document had been
:  created.... and the answer is no.


: --
: Howard Fore, [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
: "The universe tends toward maximum irony. Dont push it." - Jeff Atwood
:
: -------------------------------------------------------------
: To unsubscribe from this list, manage your profile @
: http://www.acfug.org?fa=login.edituserform
:
: For more info, see http://www.acfug.org/mailinglists
: Archive @ http://www.mail-archive.com/discussion%40acfug.org/
: List hosted by FusionLink
: -------------------------------------------------------------




Mischa Uppelschoten
The Banker's Exchange, LLC.
4200 Highlands Parkway SE
Suite A
Smyrna, GA 30082-5198

Phone:    (404) 605-0100 ext. 10
Fax:    (404) 355-7930
Web:    www.BankersX.com<http://www.BankersX.com>
Follow this link for Instant Web Chat:
http://www.bankersx.com/Contact/chat.cfm?Queue=MUPPELSCHOTEN
----------------------- Original Message -----------------------

From: "Howard Fore" <[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>
To: [email protected]<mailto:[email protected]>
Date: Wed, 6 Aug 2008 15:39:08 -0400
Subject: Re: [ACFUG Discuss] CF8 & PDFs

I think (Josh please correct me) that the availability of the data in the final 
PDF file has more to do with how the PDF was created. If you just use a PDF 
printer to create the file your data isn't available, but my understanding was 
that if you used a "PDF form" on in the input side then the data is still in 
those form fields in the binary file and thus extractable. Otherwise I don't 
see what the excitement would be about with PDF workflows in Livecycle.
On Wed, Aug 6, 2008 at 3:14 PM, Jeff Howard <[EMAIL PROTECTED]<mailto:[EMAIL 
PROTECTED]>> wrote:
Ok, I took your advice from your first post and converted it to text and dumped 
it just to see what I would be dealing with.  I understand what you are saying 
about pulling structured data from an unstructured source.  I guess my thought 
was if the the document was created from a form, that there may be a way to 
pull the form fields out once the document had been created.... and the answer 
is no.



--
Howard Fore, [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
"The universe tends toward maximum irony. Don't push it." - Jeff Atwood

-------------------------------------------------------------
To unsubscribe from this list, manage your profile @
http://www.acfug.org?fa=login.edituserform

For more info, see http://www.acfug.org/mailinglists
Archive @ http://www.mail-archive.com/discussion%40acfug.org/
List hosted by FusionLink<http://www.fusionlink.com>
-------------------------------------------------------------
------------------------------------------------------------- To unsubscribe 
from this list, manage your profile @ 
http://www.acfug.org?fa=login.edituserform For more info, see 
http://www.acfug.org/mailinglists Archive @ 
http://www.mail-archive.com/discussion%40acfug.org/ List hosted by 
http://www.fusionlink.com 
-------------------------------------------------------------



-------------------------------------------------------------

To unsubscribe from this list, manage your profile @ 

http://www.acfug.org?fa=login.edituserform



For more info, see http://www.acfug.org/mailinglists

Archive @ http://www.mail-archive.com/discussion%40acfug.org/

List hosted by http://www.fusionlink.com

-------------------------------------------------------------


Reply via email to