Hi Mike, You are talking about the version with the mht parser, right? I write here an extract of where I mention mht things and I attach the whole file and the parser (originally the parser would create files for the files appearing in the mht. I modified it so it will only output the code in the htm file). Maybe this parser I modified is sending some other garbage that can't be read by the indexer?
bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \ .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css valid_extensions: .html .htm .shtml .php .uhtml .phtml .txt .pdf .mht external_parsers: application/postscript /usr/local/apache/htdocs/htdig- 3.1.6/contrib/parsepdf.pl\ application/pdf /usr/local/apache/htdocs/htdig- 3.1.6/contrib/parsepdf.pl \ application/mht /opt/vin/mht2html2.pl Thanks a lot for your help! Regards, Ainhoa On Feb 5, 2008 9:58 PM, <[EMAIL PROTECTED]> wrote: > Can you show us at least an extract of your config file - as you describe > it this should work. > > Regards, > Mike > > > -----Original Message----- > From: [EMAIL PROTECTED] on behalf of Ainhoa L > Sent: Tue 2/5/2008 4:09 PM > To: htdig-general@lists.sourceforge.net > Subject: [htdig] Htdig and MHT files > > Hi! Maybe this is a very stupid question but, is it possible to index mht > files with htdig? > I have tried with the mht in the valid_extensions list, etc. Obviously > htdig > doesn't take them as html and refuses to index them. I looked for a parser > and found a mht2html parser, modified it so it just sends through output > the > html. I added it to the parsers in the htdig config file. This didn't > work, > although the parser returns valid html... > I would like to know if there is any way to index mht files with htdig? > Thanks a lot for your help. > >
htdig.conf
Description: Binary data
mht2html2.pl
Description: Binary data
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________ ht://Dig general mailing list: <htdig-general@lists.sourceforge.net> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general