According to Pietro Palladino:
> I'm an Italian engineer (so sorry for my English) :-) and I'm evaluating 
> htdig  to use it on the website of  University of Naples...It's a very good 
> search engine, but I had some problems...I succeeded in indexing  .doc, .rtf, 
> .pdf, .ps and .ppt files, but I couldn't index .xls files.
> Actually I'm using RedHat 7.1 for the testing. These are the options that I 
> inserted in my htdig.conf file:
> 
> external_parsers: application/rtf->text/html /usr/local/scripts/doc2html.pl \
>                   text/rtf->text/html /usr/local/scripts/doc2html.pl \
>                   application/pdf->text/html /usr/local/scripts/doc2html.pl \
>                   application/postscript->text/html 
> /usr/local/scripts/doc2html.pl \
>                   application/msword->text/html 
> /usr/local/scripts/doc2html.pl \
>                   application/msexcel->text/html 
> /usr/local/scripts/doc2html.pl \
>                   application/vnd.ms-excel->text/html 
> /usr/local/scripts/doc2html.pl \
>                   application/vnd.ms-powerpoint->text/html 
> /usr/local/scripts/doc2html.pl
> 
> I installed xlHtml-0.2.6-2 as an excel parser. In this package there's a ppt 
> parser too (pptHtml). The thing I can't understand is that the pptHtml works 
> fine (when it's called from doc2html.pl) but xlHtml doesn't work :-(((.
> I tested it from the command line and it works great....but I don't know why 
> it doesn't work when called from doc2html.pl. In this Perl script, the lines 
> concerning the parsing are almost the same for both .ppt and .xls files....

Do you know for certain that your web server returns a content-type header
of application/vnd.ms-excel for a .xls file?  If not, that could be the
problem.  You can also try running...

  /usr/local/scripts/doc2html.pl /path/to/some/file.xls \
    application/vnd.ms-excel http://myhost/some/file.xls /path/to/htdig.conf

from the command line to see if doc2html properly spits out text from
your spreadsheet, in the form of an HTML file.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to