Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

Arcane Jill Thu, 09 Dec 2004 06:29:00 -0800


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of Antoine Leca
Sent: 09 December 2004 11:29
To: Unicode Mailing List
Subject: Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

Windows filesystems do know what encoding they use.
Err, not really. MS-DOS *need to know* the encoding to use, a bit like a

*nix application that displays filenames need to know the encoding to use
the correct set of glyphs (but constrainst are much more heavy.)

Sure, but MS-DOS is not Windows. MS-DOS uses "8.3" filenames. But it's not like MS-DOS is still terrifically popular these days.

But when it comes to other Windows applications (still the more common) that

happen to operate in 'Ansi' mode, they are subject to the hazard of codepage
translations.

Sure, but this has got nothing to do with the filesystem. The Windows filesystem(s) store filenames in those disk sectors which are reserved for file headers, and in these location they are stored using sixteen-bit wide code units. (I assume this can only be UTF-16?). Thus, "Windows file systems do know what encodings they use" seems to me to be a correct statement.

The fact that applications can still open files using the legacy fopen() call (which requires char*, hence 8-bit-wide, strings) is kind of irrelevant. If the user creates a file using fopen() via a code page translation, AND GETS IT WRONG, then the file will be created with Unicode characters other than those she - but those characters will still be Unicode and unambiguous, no?

that is, usually, it is restricted to US ASCII, very much like the usable

set in *nix cases...

[OFF TOPIC] Why do so many people call it "US ASCII" anyway? Since "ASCII" comprises that subset of Unicode from U+0000 to U+007F, it is not clear to me in what way "US-ASCII" is different from ASCII. It's bad enough for us non-Americans that the A in ASCII already stands for "American", but to stick "US" on the front as well is just .... Anyway, back to the discussion on US-Unicode...

Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

Reply via email to