Hi Mike,

You are talking about the version with the mht parser, right?
I write here an extract of where I mention mht things and I attach the whole
file and the parser (originally the parser would create files for the files
appearing in the mht. I modified it so it will only output the code in the
htm file). Maybe this parser I modified is sending some other garbage that
can't be read by the indexer?

bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \
.jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css

valid_extensions: .html .htm .shtml .php .uhtml .phtml .txt .pdf .mht

external_parsers: application/postscript /usr/local/apache/htdocs/htdig-
3.1.6/contrib/parsepdf.pl\ application/pdf /usr/local/apache/htdocs/htdig-
3.1.6/contrib/parsepdf.pl \
application/mht /opt/vin/mht2html2.pl

Thanks a lot for your help!
Regards,

Ainhoa



On Feb 5, 2008 9:58 PM, <[EMAIL PROTECTED]> wrote:

> Can you show us at least an extract of your config file - as you describe
> it this should work.
>
> Regards,
> Mike
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED] on behalf of Ainhoa L
> Sent: Tue 2/5/2008 4:09 PM
> To: htdig-general@lists.sourceforge.net
> Subject: [htdig] Htdig and MHT files
>
> Hi! Maybe this is a very stupid question but, is it possible to index mht
> files with htdig?
> I have tried with the mht in the valid_extensions list, etc. Obviously
> htdig
> doesn't take them as html and refuses to index them. I looked for a parser
> and found a mht2html parser, modified it so it just sends through output
> the
> html. I added it to the parsers in the htdig config file. This didn't
> work,
> although the parser returns valid html...
> I would like to know if there is any way to index mht files with htdig?
> Thanks a lot for your help.
>
>

Attachment: htdig.conf
Description: Binary data

Attachment: mht2html2.pl
Description: Binary data

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
ht://Dig general mailing list: <htdig-general@lists.sourceforge.net>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to