On Wed, Jan 29, 2003 at 11:14:56AM +0100, Peter Karlsson wrote: > Denis Barbier: > > > Err, ascii(7) tells me that 0x5C *is* a backslash. > > Yes, but these documents aren't ASCII, so 0x5C may not or may not be a > backslash there, depending on where they are located in the file.
Ok. > > Could you please have a look at chinese/po/others.zh.po and tell me > > what to do with Subscribe/Unsubscribe translations? > > Nothing should need to be done, since the 0x5C byte is the trail byte > of the character, a proper MBCS aware string scanner will recognize > that it is not a backslash character (unlike, for instance, in the > "please respect the ad policy" string a bit further down, which *does* > contain a backslash in the translation). Getting the string scanner to > work properly requires configuring the locales properly. The problem with current WML is that streams are bytes and not characters, this is why 0x5C bytes have to be escaped. I am preparing a character oriented version, but there are major backward compatibility problems. It means that any single file must contain only one encoding, some files have to be fixed under webwml. > Big5 is a bit problematic since it allows non-highbit characters as > trail bytes, similar to the problems with ISO 2022-JP. A stateful > string scanner is required to handle it properly. LibC should work fine > as long as the proper locale is available, and I am pretty sure that > the gettext utilities will handle this properly. Yes, gettext is safe. Instead of escaping some problematic characters, a better solution could be to perform encoding conversions (as with Japanese files) to a safe encoding. Is there anyone interested in testing this scheme? Denis

