Dennis,

You might take a look at using a combination of pstotext ( 
http://www.research.compaq.com/SRC/virtualpaper/pstotext.html ) and 
Ghostscript ( http://www.ghostscript.com/ )

I currently use this arrangement to do pretty much what you are requesting.

HTH,

Kevin

At 02:36 PM 7/13/01 -0400, you wrote:
>For this project we are using CF 4.01 (client's server) and we are using
>verity to index the database, with excellent results.  Using verity to index
>the PDF files is not an option in this case.
>
>What we are looking for is either a CFX or a COM object that can be called
>from CF to extract the raw unformatted ASCII text from the PDF files.
>Anyone know of such an animal?
>
>Best Regards,
>
>Dennis Powers
>UXB Internet
>(203) 879-2844
>http://www.uxbinfo.com/
>
>-----Original Message-----
>From: Robert Rusher [mailto:[EMAIL PROTECTED]]
>Sent: Friday, July 13, 2001 1:41 PM
>To: CF-Talk
>Subject: RE: Extracting Text from PDF Documents with CF
>
>Verity requires a filter for the indexing of PDF
>documents. This filter is provided with CF in most
>cases.
>The version of Verity that is included in ColdFusion
>4.5.1 for Linux and HP-UX does not include a filter
>for PDF files.
>
>Regards,
>Rob
>--- Jann VanOver <[EMAIL PROTECTED]> wrote:
> > I thought Verity could index the text in PDFs
> > automatically!  It did with
> > previous versions of Acrobat.  You can create a
> > Verity index that indexes
> > your database AND other files (pdfs) that you want.
> > Try it and see.
> >
> > -----Original Message-----
> > From: Dennis Powers [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, July 13, 2001 9:00 AM
> > To: CF-Talk
> > Subject: Extracting Text from PDF Documents with CF
> >
> >
> > Hi,
> >
> > I am wondering if anyone has a method of extracting
> > the raw (unformatted)
> > text from a PDF file using CF?  I have a project
> > were we need to index PDF
> > files AND associated information in a database.  We
> > are currently using
> > Verity for searching the database with great success
> > but now we need to
> > index the PDF files that are associated with the
> > data records in the
> > database.
> >
> > When a user uploads a new PDF I would like to
> > extract the text from it and
> > add it to the database with the other information.
> > Then I can use Verity to
> > search all the data fields AND the PDF text data
> > field.
> >
> > A CFX or a COM object would be nice so that I can
> > call it from CF. I would
> > be very appreciative if anyone can steer me to a tag
> > or object that can
> > accomplish this task.
> >
> > Best Regards,
> >
> > Dennis Powers
> > UXB Internet
> > (203) 879-2844
> > http://www.uxbinfo.com/
> >

-----
Kevin Ward
Web Developer
Lattice Semiconductor Corporation
5555 NE Moore Court
Hillsboro, OR 97124-6421
Ph: 503.268.8656
Fx: 503.268.8693
[EMAIL PROTECTED]
http://www.latticesemi.com/


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm

Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

Reply via email to