When I enter a Unicode character (Mozilla 0.9.9 nicely supports UTF-8
cut&paste from xterm) into a bugzilla bug description, then the resulting
web page shows these characters as human-readable numeric character
references. Example:

  http://bugzilla.mozilla.org/show_bug.cgi?id=135762

What exactly do the W3C standards say about how Unicode characters
entered into form fields are supposed to be submitted by the HTTP
client to the server.

I understand that the HTTP 1.0 commands can have MIME headers that
indicate the encoding used in the text body that might follow the
HTTP command, but form field values are provided in the URL of the
POST command and not as a message body. What character encoding is
used here and how can I represent Unicode characters in a standards
conforming way? Does the HTTP standard require the use of NCRs here and
is bugzilla just sloppy in decoding them properly?

Please try this with other web interfaces!

Markus

P.S.: Mozilla 0.9.9 looks very promising. Postscript printing of UTF-8
pages has improved significantly, but still needs very careful testing
(there might be a few bugs lurking in the encoding mapping tables they
use).

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to