On 04/05/2002 10:14 PM, Markus Kuhn wrote:
> When I enter a Unicode character (Mozilla 0.9.9 nicely supports UTF-8
> cut&paste from xterm) into a bugzilla bug description, then the resulting
> web page shows these characters as human-readable numeric character
> references. Example:
> 
>   http://bugzilla.mozilla.org/show_bug.cgi?id=135762
> 
> What exactly do the W3C standards say about how Unicode characters
> entered into form fields are supposed to be submitted by the HTTP
> client to the server.

Just had a look into the HTML 4 spec, chapter "Processing form data".
(http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4):

"    * If the method is "get" and the action is an HTTP URI, the user 
agent takes the value of action, appends a `?' to it, then appends the 
form data set, encoded using the "application/x-www-form-urlencoded" 
content type. The user agent then traverses the link to this URI. In 
this scenario, form data are restricted to ASCII codes.
     * If the method is "post" and the action is an HTTP URI, the user 
agent conducts an HTTP "post" transaction using the value of the action 
attribute and a message created according to the content type specified 
by the enctype attribute."

Sounds to me like: whenever a form is submitted by "GET" the character 
set is restricted to ASCII, no Unicode possible here!
And it looks like this is true whenever data is appended as a query 
string to the URL using the encoding 
"application/x-www-form-urlencoded", even in POST requests.
To POST other than ASCII character data is only possible using the 
MIME-Type "multipart/formdata", and the user agent has to specify a 
"Content-Type" including the "charset" parameter for the data of each 
form element.
(In practice, LYNX is the only browser I have seen so far that really 
adds the "charset" parameter to the data of the form fields.)
Christoph

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to