Hi Gilles,

El Perfecto:

Thank you very much for your giant leap for PDF kind;) 

I applied your second patch to parse_doc.pl and Derek's fix to
xpdf/TextOutputDev.cc; now all the PDF files in my search path are indexed
using the external parser directive in the config file:

external_parsers:  application/msword /usr/local/bin/parse_doc.pl \
                   application/postscript /usr/local/bin/parse_doc.pl \
                   application/pdf /usr/local/bin/parse_doc.pl

-.0001:

One crappy PDF file creates a score of errors during the dig:

  External parser error in line:w^@(Garbage)*

It also appears in the search results as:

  Word Document prereg.pdf

instead of

  PDF Document prereg.pdf

The file is:

  http://www.ccsf.cc.ca.us/Resources/Title3/training/prereg.pdf

It can be searched with:

  http://www.ccsf.cc.ca.us/cgi-bin/htsearch?config=htdig&restrict=\
  &exclude=&words=pre-registration+form&method=and&format=builtin-short

No other word in that file gives a search result, I guess the error had
happened at the top of the file after the line Pre-Registration Form. 

P.S.  I couldn't correspond during the week because I had a hectic one; 
come to think of it, I have one every week;) 

Best regards,

Joe

     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        [EMAIL PROTECTED]



------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to