Re: VM Collection

Adam Thornton Thu, 21 Aug 2003 12:45:37 -0700

On Thu, 2003-08-21 at 13:55, David Boyes wrote:
> On the full Adobe Acrobat CD there is a tool (called PDFINDEX) that you can
> run to generate a searchable index from a bunch of PDF files in a directory.
> It reads all the files, creates word lists, and creates a searchable index
> that you can use from within Acrobat or the Acrobat Reader to find words or
> phrases in a collection of PDF files. It's beastly slow and sucks CPU like
> crazy, but it works pretty well. We run it once a week on our directories
> full of PDF files, and the resulting index is very helpful. It's not as good
> as the search function in Bookmanager, but it also allows you to index
> documents from multiple vendors using the same tool.


And if you don't have Acrobat, you can actually do this for free and
make yourself a web-searchable index while you're at it.  Use one of
Adobe's tools at
http://www.adobe.com/products/acrobat/access_onlinetools.html to convert
the PDF to HTML.  Then use htdig to index your html.

I did this (with a different converter--probably Ghostscript and
ps2ascii) a few years ago with a CD collection of magazines, and it
worked quite well.

Adam

Re: VM Collection

Reply via email to