A few weeks ago, someone mentioned that we don't index <img alt="...">
text. I figured it would be a pretty easy addition to the HTML parser.
Along the way, I think we might be able to significantly clean up the
do_tag method in the HTML parser.
So here's how we do meta tags:
case 20: // "meta"
{ position += length;
Configuration conf;
conf.NameValueSeparators("=");
conf.Add(position);
So this seems like a really good way to parse the tags in general. After
all, what are tag attributes but key-value pairs. Thus, can't we just use
this for most of the tags where we want the attributes? Then I could get
the alt text like this:
Configuration attrs;
attrs.NameValueSeparators("=");
conf.Add(position);
...
// "img"
got_word(attrs["alt"]...);
Are there any hitches I'm ignoring? Since the configuration files deal
with quoted values, shouldn't this work for even src attributes?
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.