According to Natalya Kolesnikova: > Ok, pdf-search runs! Great!
> I try now to index .ppt and .xls Files: > htdig.conf > external_parsers: application/rtf->text/html > /srv/www/htdig/doc2html/doc2html.pl \ > text/rtf->text/html /srv/www/htdig/doc2html/doc2html.pl \ > application/pdf->text/html /srv/www/htdig/doc2html/doc2html.pl \ > application/vnd.ms-excel->text/html /srv/www/htdig/doc2html/doc2html.pl \ > application/vnd.ms-powerpoint->text/html > /srv/www/htdig/doc2html/doc2html.pl\ > > ppthtml and xlhtml are working from command line ok. > doc2html with .ppt-file or .xls-file as Argument is working ok, also. > > But if I run rundig, I neither see .ppt-files nor .xls-files indexing! Again, it would be a good idea to run htdig -ivvv with start_url set to the URLs of a single .xls file and a single .ppt file, just to see how it deals with these. Pay special attention to the Content-Type header that the server returns for each of these files, as not all web servers follow the common convention of using application/vnd.ms-excel and application/vnd.ms-powerpoint for these content types. I've seen several different variations of these, especially for Excel files. Also, never end the last line of a multi-line attribute definition with a backslash, as it will cause htdig to swallow the following line as part of the same definition. The content types you define in your external_parsers definition must match those your server actually returns. You can have multiple entries in external_parsers for a given file type just to cover all bases as far as possible content types a server might use, especially when indexing several differently-configured web servers. E.g.: external_parsers: \ application/rtf->text/html /srv/www/htdig/doc2html/doc2html.pl \ text/rtf->text/html /srv/www/htdig/doc2html/doc2html.pl \ application/pdf->text/html /srv/www/htdig/doc2html/doc2html.pl \ application/vnd.ms-excel->text/html /srv/www/htdig/doc2html/doc2html.pl \ application/msexcel->text/html /srv/www/htdig/doc2html/doc2html.pl \ application/excel->text/html /srv/www/htdig/doc2html/doc2html.pl \ application/vnd.ms-powerpoint->text/html /srv/www/htdig/doc2html/doc2html.pl \ application/mspowerpoint->text/html /srv/www/htdig/doc2html/doc2html.pl \ application/powerpoint->text/html /srv/www/htdig/doc2html/doc2html.pl You may also need to customise the doc2html.pl script to allow any non-standard content types your server returns. Alternatively, if your server is returning unusual content types, and you can configure the server, then that may be the easiest/best fix. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

