DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=30621>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=30621 HTML Parser doesn't decode character references in attributes ------- Additional Comments From [EMAIL PROTECTED] 2004-09-07 09:15 ------- I didn't attach my patch because although it fixes the problem for us it's a kludge (IMO). The text inside an attribute value is being parsed twice: the grammar definition treats it as a simple string, and my patch then parses that string to resolve the character references in <img alt="..."> and <meta content="...">; other attribute values are left alone. I suspect that character references should be resolved in other attribute values such as <meta name="..."> even though it should never be necessary to use a character reference here. The HTML definition isn't entirely clear - perhaps the SGML standard is clearer. Since the HTML parser is an example, it shouldn't include kludges like this (again, IMO). The grammar describing an attribute value ought to be correct. Since I needed a quick fix, the kludge is sufficient for me. No-one else has complained (yet) so I don't see any need to rush a poor solution into the released product. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]