OK, I give up.

htdig3.1.5 installed with no problems on my win2k machine running 
Cygwin.  It indexes HTML files just fine.  But it broke completely when I 
installed the latest version of xpdf and tried to index some PDF files.  It 
looks like every line of the pdftotext output has been flagged with an 
external parser error.  Some of the error messages are appended (with some 
mods so I can see where exactly it was failing).

What's going on here?  The code in ExternalParser.cc looks like it is 
looking for something completely different.  The big switch statement seems 
to be looking for HTML tags inside the <> brackets.  But that's not what I 
have at the beginning of my lines.  I can't tell what this routine is 
looking for, but it doesn't appear to be raw HTML input.  (I ran the 
doc2html command it is trying to run and got a perfectly valid HTML file 
from the PDF.)

Shouldn't doc2html be returning an HTML file, and why doesn't 
ExternalParser seem to be able to read it?

Help!

-- Malcolm

Note, I've added some debugging output to ExternalParse.cc so I can see 
what command is being run. And which parser error I was receiving (this is 
the one after the default entry in the switch.)

./rundig
Pipe command is /usr/local/bin/doc2html.pl /opt/www/htdig/db/htdext.31408 
application/pdf "http://localhost/foo.pdf"; /opt/www/htdig/conf/htdig.conf
External parser error 8 in line:<HTML>
  URL: http://localhost/foo.pdf
External parser error 8 in line:<HEAD>
  URL: http://localhost/foo.pdf
External parser error 8 in line:<TITLE>rj.dvi [foo.pdf]</TITLE>
  URL: http://localhost/foo.pdf
External parser error 8 in line:</HEAD>
  URL: http://localhost/foo.pdf
External parser error 8 in line:<BODY>
  URL: http://localhost/foo.pdf
External parser error 8 in line:<PRE>
  URL: http://localhost/foo.pdf
External parser error 8 in line:F
  URL: http://localhost/foo.pdf
External parser error 5 in line:ast Algorithms for Mining Asso ciation Rules


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to