According to Mathieu Peltier:
> I would like not to index some part of a document but still follow links. I
> have read in the FAQ that <noindex follow> tag can be used for that:
> ...
> <noindex follow>
> bla bla
> </noindex>
> ...
>
> The problem is that the html document is then not conformed to html dtd. So I
> wonder if there is other solution to do that? <!--htdig_noindex--> can be
> used to prevent indexing but htdig will not follow links too then. And It
> seems that no <!--htdig_noindex_follow--> tag exists? Something I miss?
No, I think you've pretty much summed up the current state of
things in the HTML parser. It wouldn't be that difficult to modify
the htdig/HTML.cc parser code to handle new tags, though, but because
comment tags are stripped out before any parsing is done, it's tricky
to add new comment-style tags as you suggest.
However, there's a little-known and seldom used feature in htdig for
preprocessing HTML files before the internal parser looks at them, which
opens all sorts of possibilities for supporting new tags or stripping out
bits of files. For example, if you add
external_parsers: text/html->text/html-internal /path/to/changehtml.sh
to your config file, you can pre-process all html files with this
changehtml.sh shell script:
#!/bin/sh
sed -e 's|<!--htdig_noindex_follow-->|<noindex follow>|g' \
-e 's|<!--/htdig_noindex_follow-->|</noindex>|g' $1
For more elaborate examples of HTML pre-processing, see our unhypermail.sh
script at http://www.htdig.org/files/contrib/parsers/ or the ungeoify.sh
script (and the companion geoupdate.sh script) at...
http://www.htdig.org/files/contrib/scripts/README.geoupdate-ungeoify
http://www.htdig.org/files/contrib/scripts/geoupdate.sh
http://www.htdig.org/files/contrib/scripts/ungeoify.sh
We use these scripts to index the GeoCrawler and SourceForge mailing
list archives for ht://Dig at www.htdig.org's search engine.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general