Title: RE: PDF indexing not working in Pro Trial for Windows
I was rather hoping to get the Windows binary version working as I don't think the client will want to have to compile the Linux version and it'll be one of their people that installs it on their live web servers.
 
Does anyone know if the 1k limitation on the trial version causes PDFs not to get properly indexed?
 
BR, Joe
-----Original Message-----
From: Holmes, Gregory [mailto:[EMAIL PROTECTED]]
Sent: 13 August 2001 13:29
To: '[EMAIL PROTECTED]'; 'Joe Frost'
Subject: RE: PDF indexing not working in Pro Trial for Windows

Joe:

I've successfully set up mnogo on NT with MySQL.  However, I compiled the unix source version with Cygwin (http://sources.redhat.com/cygwin/).

Here are the lines from indexer.conf that I use:

Mime "application/pdf; charset=iso-8859-1"  "text/html"                  "/usr/local/bin/pdf2html.pl $1 application/pdf"

Mime application/pdf  "text/html"                  "/usr/local/bin/pdf2html.pl $1 application/pdf"

pdf2html.pl is a contributed script from htdig (www.htdig.org) that uses pdfinfo and pdftotext to construct a web page and feed it back to the indexer.  You get meta data this way as well as the text indexed.  Obviously, you need perl for this to work.

Also, there might have been a default line in indexer.conf excluding PDFs from indexing, if so, you'll have to remove it or comment it out.

Works for me, hope it helps.

Greg Holmes

-----Original Message-----
From: Joe Frost [mailto:[EMAIL PROTECTED]]
Subject: PDF indexing not working in Pro Trial for Windows

...........
a client of mine who hosts exclusively on NT wants to set it up as a search
engine for a new project they have. Much of the content will be in PDFs so
this feature is vital.

I have set up my own test system using the current Pro Trial version on
Windows 2000 with MySQL. Indexing of normal html URLs works fine but
indexing of PDFs does not. I'm using pdftotext.exe with the settings
suggested in the help file including using "/" instead of the normal Windows
"\". The PDF is fetched by the indexer and seems to be briefly parsed but
the URL is not included in any subsequent searches for terms that it is
known to include.

Is this a restriction of the trial version or am I doing something wrong?

Reply via email to