Sylvain Wallez wrote:
Marc Portier wrote:
Hi all,
we seem to have a smaal inconsistency concerning encoding of HTML forms
- our HTML serializer by default is using the UTF-8 encoding.
(in fact it's set nowhere in the system and is thus left over to xalan which most likely is going down the easy path of assuming the default from XML land?)
- not setting the form-encoding parameter in cocoon's web.xml defaults to assuming the browsers are sending the request params in the ISO-8859-1 encoding (CocoonServlet.java line 500)
I encountered this problem and discovered that browsers (at least IE6 & Mozilla) send form content using the encoding of the HTML page. But the problem is that no header tells the server about the used encoding.
indeed, this is a known issue, see for instance the servlet 2.3 spec section SRV 4.9 Request Data Encoding
cocoon has inside even a mechanism to survive the issue on 2.2 instalations
What is the supposed way of writing portable applications that automagically find the correct encoding?
the supposed way is that you consider that the URI contract communication is not only about the uri and the allowed request-parameters but also the expected way those request params are encoded!
so you expect the end-users of your application to be setting the encoding in their browser according to that contract :-)
in practice this means that
1/ the one generating the html form makes sure he applies that very encoding on the way out
2/ we all expect that the browser will do a correct auto-detection and the end-user doesn't (know about how to) change that encoding manually before submitting the form
the awkward thing is that the HTTP spec has room for letting the browser communicate what was used as encoding (and the servlet 2.3 implementation should take that into account) BUT NONE OF THE BROWSERS DO IT.
sigh, it is the same kind of historic 'wrong' as
- wrong implementations of 302 relocates (http 1.1 introduced 307 to allow room for the correct implementation of what http 1.0 intended 302 to be)
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html (see note inside 10.3.3)
- the wrong spelling of referrer in 'http_referer' (should have been two r's )
http://www.google.com/search?q=http_referer+spelling&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8
so, welcome to the web:
we create specs so fast that we can't be bothered with the spelling! (or the correct implementation)
Wobbly me doesn't mind that much about the folkloristic spelling part ;-)
-marc= -- Marc Portier http://outerthought.org/ Outerthought - Open Source, Java & XML Competence Support Center Read my weblog at http://radio.weblogs.com/0116284/ [EMAIL PROTECTED] [EMAIL PROTECTED]
