According to Rob Strange:
> I need to index a site that includes rtf files. I found and tested a
> reasonable rtf2text converter on CPAN (RTF-Parser-1.07 by Philippe Verdret)
> and have added the following to the htdig conf file:
>
> external_parsers: application/rtf->text/plain /htdig/bin/rtf2text
>
> When I index the site, the search results still display the source code of
> the rtf, which looks terrible. Can anyone help please? Am I defining the
> external_parsers attribute incorrectly? Or any other suggestions for how I
> can index rtf files?
Are you sure it's OK to call rtf2text directly from htdig, without a
wrapper script to handle the command line arguments? There's a very
specific order to the arguments an external converter or parser must
accept. See http://www.htdig.org/attrs.html#external_parsers
You may also want to have a look at the doc2html converter, in
http://www.htdig.org/files/contrib/parsers/, which handles a number
of document types, including rtf.
The other thing you should do is make sure the web server is indeed
identifying your rtf files as application/rtf, and not some other
MIME type. Try htdig -vvv and have a look at the Content-type
headers the server returns.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html