According to Natalya Kolesnikova:
> Ok, pdf-search runs!

Great!

> I try now to index .ppt and .xls Files:
> htdig.conf
> external_parsers:    application/rtf->text/html
> /srv/www/htdig/doc2html/doc2html.pl \
> text/rtf->text/html /srv/www/htdig/doc2html/doc2html.pl \
> application/pdf->text/html /srv/www/htdig/doc2html/doc2html.pl \
> application/vnd.ms-excel->text/html /srv/www/htdig/doc2html/doc2html.pl \
> application/vnd.ms-powerpoint->text/html
> /srv/www/htdig/doc2html/doc2html.pl\
> 
> ppthtml and xlhtml are working from command line ok.
> doc2html with .ppt-file or .xls-file as Argument is working ok, also.
> 
> But if I run rundig, I neither see .ppt-files nor .xls-files indexing!

Again, it would be a good idea to run htdig -ivvv with start_url set
to the URLs of a single .xls file and a single .ppt file, just to see
how it deals with these.  Pay special attention to the Content-Type
header that the server returns for each of these files, as not all web
servers follow the common convention of using application/vnd.ms-excel
and application/vnd.ms-powerpoint for these content types.  I've seen
several different variations of these, especially for Excel files.

Also, never end the last line of a multi-line attribute definition with
a backslash, as it will cause htdig to swallow the following line as
part of the same definition.

The content types you define in your external_parsers definition must
match those your server actually returns.  You can have multiple entries
in external_parsers for a given file type just to cover all bases as far
as possible content types a server might use, especially when indexing
several differently-configured web servers.  E.g.:

external_parsers: \
  application/rtf->text/html /srv/www/htdig/doc2html/doc2html.pl \
  text/rtf->text/html /srv/www/htdig/doc2html/doc2html.pl \
  application/pdf->text/html /srv/www/htdig/doc2html/doc2html.pl \
  application/vnd.ms-excel->text/html /srv/www/htdig/doc2html/doc2html.pl \
  application/msexcel->text/html /srv/www/htdig/doc2html/doc2html.pl \
  application/excel->text/html /srv/www/htdig/doc2html/doc2html.pl \
  application/vnd.ms-powerpoint->text/html /srv/www/htdig/doc2html/doc2html.pl \
  application/mspowerpoint->text/html /srv/www/htdig/doc2html/doc2html.pl \
  application/powerpoint->text/html /srv/www/htdig/doc2html/doc2html.pl

You may also need to customise the doc2html.pl script to allow any
non-standard content types your server returns.  Alternatively, if your
server is returning unusual content types, and you can configure the
server, then that may be the easiest/best fix.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to