A few weeks ago, someone mentioned that we don't index <img alt="...">
text. I figured it would be a pretty easy addition to the HTML parser.
Along the way, I think we might be able to significantly clean up the
do_tag method in the HTML parser.

So here's how we do meta tags:

        case 20:        // "meta"
        {           position += length;
            Configuration       conf;
            conf.NameValueSeparators("=");
            conf.Add(position);

So this seems like a really good way to parse the tags in general. After
all, what are tag attributes but key-value pairs. Thus, can't we just use
this for most of the tags where we want the attributes? Then I could get
the alt text like this:

        Configuration   attrs;
        attrs.NameValueSeparators("=");
        conf.Add(position);
        ...
        // "img"
        got_word(attrs["alt"]...);

Are there any hitches I'm ignoring? Since the configuration files deal
with quoted values, shouldn't this work for even src attributes?

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to