According to Rob Strange:
> I need to index a site that includes rtf files. I found and tested a
> reasonable rtf2text converter on CPAN (RTF-Parser-1.07  by Philippe Verdret)
> and have added the following to the htdig conf file:
> 
> external_parsers:       application/rtf->text/plain /htdig/bin/rtf2text
> 
> When I index the site, the search results still display the source code of
> the rtf, which looks terrible. Can anyone help please? Am I defining the
> external_parsers attribute incorrectly? Or any other suggestions for how I
> can index rtf files?

Are you sure it's OK to call rtf2text directly from htdig, without a
wrapper script to handle the command line arguments?  There's a very
specific order to the arguments an external converter or parser must
accept.  See http://www.htdig.org/attrs.html#external_parsers

You may also want to have a look at the doc2html converter, in
http://www.htdig.org/files/contrib/parsers/, which handles a number
of document types, including rtf.

The other thing you should do is make sure the web server is indeed
identifying your rtf files as application/rtf, and not some other
MIME type.  Try htdig -vvv and have a look at the Content-type
headers the server returns.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to