When I enter a Unicode character (Mozilla 0.9.9 nicely supports UTF-8 cut&paste from xterm) into a bugzilla bug description, then the resulting web page shows these characters as human-readable numeric character references. Example:
http://bugzilla.mozilla.org/show_bug.cgi?id=135762 What exactly do the W3C standards say about how Unicode characters entered into form fields are supposed to be submitted by the HTTP client to the server. I understand that the HTTP 1.0 commands can have MIME headers that indicate the encoding used in the text body that might follow the HTTP command, but form field values are provided in the URL of the POST command and not as a message body. What character encoding is used here and how can I represent Unicode characters in a standards conforming way? Does the HTTP standard require the use of NCRs here and is bugzilla just sloppy in decoding them properly? Please try this with other web interfaces! Markus P.S.: Mozilla 0.9.9 looks very promising. Postscript printing of UTF-8 pages has improved significantly, but still needs very careful testing (there might be a few bugs lurking in the encoding mapping tables they use). -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
