According to Anthony E.:
> Well, good news with this "true bug" if you will...
>
> the word "true" word was coming from the 'alt'
> attribute of the html 'img' tag...
>
> I didn't have anything in the "alt" attribute - it was
> empty, resulting in boolean test = true for htdig
>
> My actual html tag was <img alt src="img.gif">
This tag is incorrect, I believe. If you want to specify empty alt text,
don't you have to explicitly say alt="", rather than simply alt? If just
specifying alt with no empty string after it is supposed to imply alt="",
then we'll have to fix the htdig code. I checked the HTML 4 standard
and it's not 100% clear to me whether or not it's valid to "minimize"
any attribute in this way, and what the implied value should be when
an attribute is minimized. It says "boolean attributes may appear in
minimized form", but alt is not a boolean attribute. However, I have
seen some attributes, like border on a table tag, appear as either
numeric or minimized (which the HTML 4 standard seems to say is OK).
In XHTML, it's no longer valid to minimize attributes in this way, so
pages that use these should eventually be changed. However, that's a bit
beside the point, as we need to handle older HTML versions correctly too.
The question remains as to what the correct usage is in this context.
> when I changed it to alt="img", the word "img" came up
> all over the search results in htdig.
That's to be expected.
> I have a bunch of spacer.gif files (1x1 clear images
> for spacing) at the top of the page.
>
> Is there anyway to turn off the inclusion of the <img>
> tag while indexing files?? This would completely
> iradicate this bug...as I don't want the 'alt'
> attribute of '<img>' tags to appear as part of the
> indexing scheme.
Well, if changing all these files to specify alt="" explicitly is not an
option, then you can do the fix Geoff suggested. However, if someone can
make a case that a minimized alt attribute should be considered valid,
and should be taken as an empty string, we'll need to change the attribute
parsing code to handle non-boolean minimized attributes.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html