Re: guile can't find a chinese named file

Marko Rauhamaa Mon, 30 Jan 2017 12:47:06 -0800

Eli Zaretskii <[email protected]>:

>> From: Marko Rauhamaa <[email protected]>
>> 
>> UTF-8 beautifully bridges the interpretation gap between 8-bit character
>> strings and text. However, the interpretation step should be done in the
>> application and not in the programming language.
>
> You can't do that in an environment that specifically targets
> sophisticated multi-lingual text processing independent of the outside
> locale.  Unless you can interpret byte sequences as characters, you
> will be unable to even count characters in a range of text,


If you need to operate on Unicode text, have the application invoke the
UTF-8 (or locale-specific) decoder. However, have the application
request it instead of guessing that the environment is all Unicode.

> You do need "other typesetting effects", naturally, but that doesn't
> mean you can get away without more or less full support of Unicode
> nowadays.

Do support it, fully even, but let the application invoke the
conversion when appropriate.

> You are talking about programming, but we should instead think about
> applications -- those of them which need to process text, or even
> access files, as this discussion shows, do need decent Unicode
> support.

Why should opening a file require Unicode support if the underlying
operating system knows nothing about Unicode? I can open a any given
file in a tiny C program without any Unicode support, under Linux, that
is.

> E.g., users generally expect that decomposed and composed character
> sequences behave and are treated identically, although they are
> different byte-stream wise.

Linux begs to differ. Regardless of the locale, two different octet
sequences that ought to be equivalent UTF-8-wise will be considered
different pathnames under Linux.

I don't need a helicopter to walk across the street.

>> But is also causing unnecessary grief in the computer-computer
>> interface, where the classic textual naming and textual protocols
>> are actually cutely chosen octet-aligned binary formats.
>
> The universal acceptance of UTF-8 nowadays makes this much less of an
> issue, IME.

You are jumping the gun. Linux won't be there for a long time if ever.
Nothing prevents a pathname, or a command-line argument, or an
environment variable, or the standard input from containing illegal
UTF-8.

I also wouldn't like my SMTP server to throw a UTF-8 decoding exception
on parsing a command.

(Also note that even Windows allows pathnames with illegal Unicode in
them if I'm not mistaken.)


Marko

Re: guile can't find a chinese named file

Reply via email to