DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=34985>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=34985 ------- Additional Comments From [EMAIL PROTECTED] 2006-11-04 12:14 ------- Not sure what you mean by security implications, but I don't think that falling back to another encoding such as ISO-8859-1 is necessary. Taking TWiki as an example, which uses paths like /bin/view/Main/WebHome, where view is the CGI script, and /Main/WebHome is the PATH_INFO (see http://twiki.org/cgi-bin/viewfile/Support/ApacheErrorsDuringEdit?rev=1.1;filename=testenv.htm for example of CGI environment variables), it would be useful to specify the following to handle non-UTF-8 encodings such as ISO-8859-1 (which are used by POST from Firefox currently): AUTH_TYPE Raw DOCUMENT_ROOT Convert GATEWAY_INTERFACE Raw HTTP_ACCEPT Raw HTTP_ACCEPT_CHARSET Raw HTTP_ACCEPT_ENCODING Raw HTTP_ACCEPT_LANGUAGE Raw HTTP_CONNECTION Raw HTTP_HOST Raw HTTP_KEEP_ALIVE Raw HTTP_USER_AGENT Raw PATH Convert (since it has pathnames) QUERY_STRING Raw (not a filename, should be interpreted by application) REMOTE_ADDR Raw REMOTE_PORT Raw REMOTE_USER Raw REQUEST_METHOD Raw REQUEST_URI Convert if valid UTF-8 (and not overlong encoding) SCRIPT_FILENAME Convert if valid UTF-8 (and not overlong encoding) SCRIPT_NAME Convert if valid UTF-8 (and not overlong encoding) SERVER_ADDR Raw SERVER_ADMIN Raw .... (rest are all raw) Basically, only those variables that correspond to filenames should be converted, and then only if they are valid UTF-8 without overlong encoding. Any variables not used by Apache should not be converted, but left to the application, or a suitable add-on Apache module for conversion. TWiki has done its own interpretation of UTF-8 URLs, independent of the OS it is running on, which is based on a technique used by IBM's web server for mainframe (z/OS) - basically it tries to recognise the URL as UTF-8 and then falls back to the native encoding (i.e. no conversion done at all). In fact we do this on the PATH_INFO ourselves. If Apache is going to carry on doing its own UTF-8 to UCS-2 conversion, which I suppose it must do in some cases that map onto a Windows filesystem (and others such as MacOS X HFS+ etc), it would be good if it recognises when data is really UTF-8 in this way. Also, it would be very helpful to have a configuration option that lets you say "don't convert variable X if it matches regex Y", e.g. don't convert PATH_INFO if it matches "/twiki/bin/.*" Some TWiki pages that might be of interest here are: http://twiki.org/cgi-bin/view/Codev/EncodeURLsWithUTF8 - how TWiki does auto-detection and conversion of UTF-8 encoding for PATH_INFO in URLs http://twiki.org/cgi-bin/view/Codev/InternationalisationUTF8 - includes material on character set auto-detection including excerpt on IBM web server approach - fortunately UTF-8 detection is much easier than the general case. http://twiki.org/cgi-bin/view/Codev/MacOSXFilesystemEncodingWithI18N - talks about a filesystem-related issue with Unicode normalisation forms on Mac OS X http://twiki.org/cgi-bin/view/Codev/ProposedUTF8SupportForI18N - general page summarising research on UTF-8 for TWiki, including some useful links -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
