Yup. As long as it's text and not an image.

ps2ascii extracts ascii text from PostScript or PDF files;  There's also a 
program called pdftotext that does pretty much the same thing.

The ASCII text will show up in the filename.pdf.txt file. Once you have 
indexed all the files called filename.pdf.txt, you're set.

Cheers,

-- Brett
"No man is an island; No tool will do every job."


At 12:38 PM 10/9/2002 -0500, Mark A. Kruger - CFG wrote:
>Pretty cool... will all the text inside the PDF show up after conversion?
>
>-----Original Message-----
>From: B Schlank [mailto:[EMAIL PROTECTED]]
>Sent: Wednesday, October 09, 2002 10:55 AM
>To: CF-Linux
>Subject: RE: Indexing PDFs
>
>
>This will sound very "non-CF", but it's not.
>
>Why don't you just create a shell script to do it for you?
>
>          find /path/to/pdf -name "*.pdf"  | awk '{print $1 "  " $1".txt"}'
>| xargs ps2ascii
>
>It will find all the pdfs in  /path/to/pdf , convert them using ps2ascii,
>saving the file in a file called "filename.pdf.txt" in the same directory.
>
>When you run your search in CF, you'll get back the "filename.pdf.txt" as
>the file to point to. Just remove the search extension as you output the
>search results, or create a jump page to remove it when someone clicks on
>it to avoid extra (milliseconds) processing time.
>
>You could always use CFexecute or Java to do this, but why waste the CF
>processing?
>
>Cheers,
>
>-- Brett
>"No man is an island; No tool will do every job."
>
>At 07:10 AM 10/9/2002 -0600, Jillian Carroll wrote:
> >I can understand that. :)
> >
> > From what I can tell the reason CF on Linux doesn't index PDFs is because
> >the filter was not purchased from verity... here is my question:
> >
> >-       I can run ps2ascii on my Linux box to convert the PDFs to text...
> >which CF
> >will be able to index.
> >
> >Given this, could anybody give me any guidance/suggestions on how I might
> >create some sort of custom tag for CF to do this on the fly? Is that even
> >possible?
> >
> >My only other alternative (that I can see) would be to create a shadow
> >directory and use ps2ascii and shadow ALL of the PDFs on this site (400 or
> >so)... let CF index that directory and then manipulate the search results
>to
> >point back to the original PDF.  I'd rather not have to do this.
> >
> >--
> >Jillian
> >
> >-----Original Message-----
> >From: Jesse Noller [mailto:[EMAIL PROTECTED]]
> >Sent: Wednesday, October 09, 2002 7:02 AM
> >To: CF-Linux
> >Subject: RE: Indexing PDFs
> >
> >
> >See, this is why I need to finish my coffee before posting.
> >
> >Jesse Noller
> >[EMAIL PROTECTED]
> >Macromedia Server Development
> >
> >"No concept man forms is valid unless he
> >integrates it without contradiction into the
> >sum of his knowledge."
> >- Ayn Rand
> >
> > > -----Original Message-----
> > > From: Jillian Carroll [mailto:[EMAIL PROTECTED]]
> > > Sent: Wednesday, October 09, 2002 8:57 AM
> > > To: CF-Linux
> > > Subject: RE: Indexing PDFs
> > >
> > > Jesse,
> > >
> > > I am well aware of this... hence my asking for alternative suggestions.
> > >
> > > --
> > > Jillian
> > >
> > > -----Original Message-----
> > > From: Jesse Noller [mailto:[EMAIL PROTECTED]]
> > > Sent: Wednesday, October 09, 2002 6:24 AM
> > > To: CF-Linux
> > > Subject: RE: Indexing PDFs
> > >
> > >
> > > Read the release notes, AFAIK indexing PDFs on Linux is not, and has not
> > > been supported.
> > >
> > > Jesse Noller
> > > [EMAIL PROTECTED]
> > > Macromedia Server Development
> > >
> > > > -----Original Message-----
> > > > From: Jillian Carroll [mailto:[EMAIL PROTECTED]]
> > > > Sent: Tuesday, October 08, 2002 2:34 PM
> > > > To: CF-Linux
> > > > Subject: Indexing PDFs
> > > >
> > > > I'm really running into a problem with the fact that CF on Linux
>cannot
> > > > index PDFs... even though it works perfectly well on Windows.
> > > >
> > > > Does anybody have any suggestions for me?  I'd be VERY appreciative!
> > > >
> > > > --
> > > > Jillian
> > > >
> > > >
> > >
> > >
> >
> >
>
>
______________________________________________________________________
This list and all House of Fusion resources hosted by CFHosting.com. The place for 
dependable ColdFusion Hosting.
------------------------------------------------------------------------------
Archives: http://www.mail-archive.com/cf-linux%40houseoffusion.com/
To Unsubscribe visit 
http://www.houseoffusion.com/index.cfm?sidebar=lists&body=lists/cf_linux or send a 
message to [EMAIL PROTECTED] with 'unsubscribe' in the body.

Reply via email to