subject:"Nicest UTF"

RE: Nicest UTF

2004-12-14 Thread Lars Kristan

Title: RE: Nicest UTF





D. Starner wrote:
  Some won't convert any and will just start using UTF-8 
  for new ones. And this should be allowed. 
 
 Why should it be allowed? You can't mix items with
 different unlabeled encodings willy-nilly. All you're going
 to get, all you can expect to get is a mess.


Easy for you to say. You're not the one that is going to answer the support calls. They WILL do it. You can jump up and down as much as you like, but they will. If I tell to users what you are telling me, they will think I am mad and will stop using my application.


Lars

RE: Nicest UTF

2004-12-13 Thread Lars Kristan

Title: RE: Nicest UTF

Marcin 'Qrczak' Kowalczyk wrote:
My my, you are assuming all files are in the same encoding.

Yes. Otherwise nothing shows filenames correctly to the user.
UNIX is a multi user system. One user can use one locale and might never see files from another user that uses a different locale. And users can even have filenames in wrong locales in their own home directory. Copied from somewhere. Perhaps only a letter here and there does not display correctly, but this doesn't mean the user can't use the file.

And what about all the references to the files in scripts?
In configuration files?

Such files rarely use non-ASCII characters. Non-ASCII characters are
primarily used in names of documents created explicitly by the user.
Rarely. So only rare systems will not boot after the conversion. And only rare programs will no longer work. Is that acceptable?

Plus, it might not be as rare as you think. It might be far more common in a country where not many people understand English and are not using latin letters on top of it.

Also, a script (a UNIX batch file) many have an ASCII name, but what if it processes some user documents for some purpose. And has a set of filenames hardcoded in it? What about MRU lists? What about documents that link other documents?

Mass renaming is a dangerous thing. It should be done gradually and with utmost care. And during this period, everything should keep working. If not, users won't even start the process.

Soft links?

They can be fixed automatically.
U, yes, not a good example. Except in case one decides to allow the user to select an option to use U+FFFD instead of failing the conversion. Then you need to be extra careful, rename any files that convert to a sinle name and keep track of everything so you can use the right names for the soft links. But yes, it can be done. If, on the other hand, you adopt the 'broken' conversion concept, you can convert all filenames, in a single pass, and don't need to build lists of softlinks since you can convert them directly.

If you want to break things, this is definitely the way to do it.

Using non-ASCII filenames is risky to begin with. Existing tools don't
have a good answer to what should happen with these files when the
default encoding used by the user changes, or when a user using a
different encoding tries to access them.
Not really. On UNIX, it is all very well defined. A filename is a sequence of bytes which is only interpreted when it is displayed. You can place a filename in a script or a configuration file and the file will be identified and opened regardless of your locale setting.

People like you and me avoid non-ASCII filenames. But not all users do.

Mozilla doesn't show such filenames in a directory listing. You
may consider it a bug, but this is a fact. Producing non-UTF-8 HTML
labeled as UTF-8 would be wrong too. There is no good solution to
the problem of filenames encoded in different encodings.
There is no good solution. True. And I am trying to find one. And yes, I would consider that a bug. They should probably use some escaping technique. And, funny thing, you would probably accept the escaping technique. But if you think about it, it is again representing invalid data with valid Unicode characters. And if un-escaping needs to be done, it introduces all the problems that you are pointing out for my 'broken' conversion. So, think of my 128 codepoints as an escaping technique. One with no overhead. One with little possibiliy of confusion. One that can be standardized and whoever comes across it will know exactly what it is. Which is definitely not true if we let each application devise its own escaping and there is no way they can interoperate.

As soon as you realize you cannot convert filenames to UTF-8, you
will see that all you can do is start adding new ones in UTF-8.
Or forget about Unicode.

I'm not using a UTF-8 locale yet, because too many programs don't
support it.
Like Mozilla. I am showing you the way programs can be made to work with UTF-8 faster and easier. And really by fixing them, not by rewriting them. At least some programs, or some portions of programs. Then developers can concentrate on the things that do require extra attention, like strupr, isspace (or their equivalence).

I'm using ISO-8859-2.
In fact you're lucky. Many ISO-8859-1 filenames display correctly in ISO-8859-2. Not all users are so lucky.

But almost all filenames are ASCII.
Basically, you are avoiding the problem alltogether. A wise decision. But it also means you don't know as much about this problem as I do.

Lars

RE: Nicest UTF

2004-12-13 Thread Lars Kristan

Title: RE: Nicest UTF

D. Starner wrote:
Lars Kristan writes:

A system administrator (because he has access to all files).
My my, you are assuming all files are in the same encoding.
And what about
all the references to the files in scripts? In
configuration files? Soft
links? If you want to break things, this is definitely the
way to do it.

Was it ever really wise to use non-ASCII file names in
scripts and configuration
files?
It goes beyond that. Please see my reply to Marcin 'Qrczak' Kowalczyk.

It's not very hard to convert soft links at the same
time.
Please see my reply to Marcin 'Qrczak' Kowalczyk.

Even if you can't do a system-wide change, it's easy enough
to change the
system files, and post a message about switching to UTF-8,
and offering to
assist any users with the change.
That's perfectly fine. But I started talking about this because I claimed that you are likely to end up by having UTF-8 filenames alongside legacy encoded filenames. If you do it gradually, that is precisely what is going to happen, at least for a certain period. But this period could be longer than expected. And as it turns out things are not simple, some users may never convert all the filenames. Some won't convert any and will just start using UTF-8 for new ones. And this should be allowed. Assuming that all filenames should be valid UTF-8 is a bad argument against my claims that applications should be able to process filenames with invalid UTF-8 sequences.

Lars

RE: Nicest UTF

2004-12-13 Thread D. Starner

 Some won't convert any and will just start using UTF-8 
 for new ones. And this should be allowed. 

Why should it be allowed? You can't mix items with
different unlabeled encodings willy-nilly. All you're going
to get, all you can expect to get is a mess.
-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

1 2 >

1 - 100 of 120 matches

Mail list logo