At 12:29 AM 3/14/2002, you wrote: > >Apache 1.3 on Win32 assumes that the names of files served are > >comprised solely of characters from character sets which are a superset > >of ASCII, such as UTF-8 or ISO-8859-1. > >Umm, I assume that ASCII as you refer to it is its 7-bit incarnation. > >Note that _all_ character sets are supersets of 7-bit ASCII, and most >are supersets of 8-bit ASCII (the exceptions being the various other >'latin' encodings - i.e. ISO-8859-2 through ISO-8859-16 which differ >in the various 'special' characters). > >This has the lovely side-effect that English is always an option, >regardless of the actual encoding being used.
Uhmm... you are only partially correct. Yes - 7-bit ASCII exists in nearly all character sets unblemished, were it not for multibyte encodings. Some encodings are 7-bit clean, that is, their other characters do not map into 0x00-0x7F. Examples are utf-8 and most European encodings. Counterexamples, however, include many Asian sets including shift-JIS where the 0x00-0x7F alternate meanings between their ASCII encoding and shifted-state bytes. The user of the Chinese character set who first commented on this in bugs ran into exactly this problem in certain shifted character combinations. >I think you've missed the boat on this one. Asian versions of Windows will >all probably use characters that you don't consider as ASCII (i.e. they will >be wide - actually Microsoft have done a pretty good job of this). No... Jeff didn't miss anything. Not only is this an issue with unclean 7-bit encodings, but the 8-bit encodings are not normalized correctly on Win32, and Win32 is case insensitive. Essentially, any Files or Directory blocks they use to protect file paths that include 8-bit characters don't even map correctly for Windows-1252 or OEM-437. Those are the sad but accurate facts. For tolower/toupper/strcasecmp, I will have a patch sometime this month to trust utf-8 and normalize appropriately, using the Win32 API which gives us some greater assurance that the mappings correspond to filename processing semantics. For 2.0, of course.
