OK, I give up.
htdig3.1.5 installed with no problems on my win2k machine running
Cygwin. It indexes HTML files just fine. But it broke completely when I
installed the latest version of xpdf and tried to index some PDF files. It
looks like every line of the pdftotext output has been flagged with an
external parser error. Some of the error messages are appended (with some
mods so I can see where exactly it was failing).
What's going on here? The code in ExternalParser.cc looks like it is
looking for something completely different. The big switch statement seems
to be looking for HTML tags inside the <> brackets. But that's not what I
have at the beginning of my lines. I can't tell what this routine is
looking for, but it doesn't appear to be raw HTML input. (I ran the
doc2html command it is trying to run and got a perfectly valid HTML file
from the PDF.)
Shouldn't doc2html be returning an HTML file, and why doesn't
ExternalParser seem to be able to read it?
Help!
-- Malcolm
Note, I've added some debugging output to ExternalParse.cc so I can see
what command is being run. And which parser error I was receiving (this is
the one after the default entry in the switch.)
./rundig
Pipe command is /usr/local/bin/doc2html.pl /opt/www/htdig/db/htdext.31408
application/pdf "http://localhost/foo.pdf" /opt/www/htdig/conf/htdig.conf
External parser error 8 in line:<HTML>
URL: http://localhost/foo.pdf
External parser error 8 in line:<HEAD>
URL: http://localhost/foo.pdf
External parser error 8 in line:<TITLE>rj.dvi [foo.pdf]</TITLE>
URL: http://localhost/foo.pdf
External parser error 8 in line:</HEAD>
URL: http://localhost/foo.pdf
External parser error 8 in line:<BODY>
URL: http://localhost/foo.pdf
External parser error 8 in line:<PRE>
URL: http://localhost/foo.pdf
External parser error 8 in line:F
URL: http://localhost/foo.pdf
External parser error 5 in line:ast Algorithms for Mining Asso ciation Rules
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html