My understanding was that the forced default charset *does* prevent browsers (or maybe, MSIE) from guessing the charset as UTF-7; UTF-7 being the special case as it's already an "escaped" encoding and hence defies normal escaping-of-client-provided-data tricks. Is that not correct?
Yes and no -- it is both the source of the problem and the biggest reason that we should NOT set charset as a default.
Consider the following two identical content resources, the first being sent as
Content-Type: text/html; charset=ISO-8859-15
http://www-uxsup.csx.cam.ac.uk/~jw35/docs/cross-site-demo.html
and the second being sent with only
Content-Type: text/html
http://www.ics.uci.edu/~fielding/xss-demo.html
I've tested the above with all of my browsers. Safari and MSIE-Mac do not
support utf-7 at all. Firefox (Mac and Win) supports utf-7 but only when
manually set (it does not auto-detect utf-7, even when read from a local file).
MSIE (Windows), of course, does the least intelligent thing -- it does
not allow users to select utf-7 manually, but does auto-detect and interpret
utf-7 if it is read from a local file, or if "auto-detect" is enabled
regardless of the content-type charset parameter -- setting charset has
no effect on MSIE's auto-detect results. In other words, it
is only at risk for XSS via utf-7 if auto-detect is enabled.
The problem we have created is that AddDefaultCharset causes entire sites to default to one charset, usually iso-8859-1. And because it is set by default (no brains spent thinking about the right value), it is often set that way even when installed in non-Latin countries [and there is also a problem in Europe, since iso-8859-15 is where the euro symbol was added]. As a result, normal users get a higher frequency of wrong charset declarations in HTTP, for which the only "standards-compliant" solution short of manually adjusting every page received is to turn on auto-detect! In other words, our default is now causing more users to be vulnerable to utf-7 XSS attacks than they would otherwise be if we never sent a default charset.
In any case, the only tutorials on cross-site scripting that still emphasize setting charset is our own (written by Marc) and CERT's (based on input from Marc). Those were intended to be temporary workarounds until folks had a chance to fix the real problems, which were non-validating scripts that echo untrusted content to users.
After doing another afternoon of research on this one, I am now convinced
that AddDefaultCharset does far more harm than good.
....Roy
