On Tue, Apr 26, 2005 at 10:31:42AM -0400, Stas Bekman wrote:
>>> Since when unescaped & in the QUERY_STRING part of the URL are not allowed?
>> I dunno the specifics, but if you try using the w3c validator you end up
>> with something like this
>>   reference not terminated by REFC delimiter
>>   <a href="http://example.com/foo.pl?foo=bar&reg=foobar";>is this valid?</a>
>>   If you meant to include an entity that starts with "&", then you should
>>   terminate it with ";". Another reason for this error message is that you
>>   inadvertently created an entity by failing to escape an "&" character just
>>   before this text.
> OK, in which case it must be some relatively recent change, since an 
> unescaped & in the QUERY_STRING was a valid separator. A pointer to the 
> relevant RFC would be nice so we can add that to the URL that started this 
> thread.

Actually, I think it's been like this since the beginning, but it's
one of those things that browsers are very forgiving about, so most
people never bump into problems with this.

Yes, & is a valid separator in a query_string, but when you include
this query_string in an HTML-document, then the query_string has to
follow the same rules as the rest of the document, which means that &
must be escaped.

The HTML-spec says:

http://www.w3.org/TR/1998/REC-html40-19980424/sgml/dtd.html#URI

URI must be of the type CDATA

and CDATA-content must be treated like this:

http://www.w3.org/TR/1998/REC-html40-19980424/types.html#type-cdata

 - Replace character entities with characters,
 - Ignore line feeds,
 - Replace each carriage return or tab with a single space. 

This doesn't say that you _must_ encode certain characters, like &, to
entities, though, only that browsers must decode the entities that
you've encoded.

Likewise, the documentation on entities, just recommends encoding in
CDATA-sections

http://www.w3.org/TR/1998/REC-html40-19980424/charset.html#h-5.3.2

  Authors should use "&amp;" (ASCII decimal 38) instead of "&" to avoid
  confusion with the beginning of a character reference (entity
  reference open delimiter). Authors should also use "&amp;" in
  attribute values since character references are allowed within CDATA
  attribute values.


However, the XHTML-spec clearly states that & must be entitized within
attribute values. 

http://www.w3.org/TR/xhtml1/#C_12

-- 
Trond Michelsen

Reply via email to