Re: [l10n-dev] sr-CS/sr-YU locale data

Danilo Šegan Wed, 03 Aug 2005 09:40:28 -0700

Hi Eike,

Sorry if I sounded too harsh.  I just heard a lot of the same
arguments before, and they only helped delay adoption (the position of
ISO 3166 has not changed).

Yesterday at 19:20, Eike Rathke wrote:

> On Tue, Aug 02, 2005 at 08:29:58 +0200, Danilo ??egan wrote:
>
>> How come nobody noticed such "short-sightedness"
>
> Many people did, and also urged the ISO maintenance agency to
> reconsider, even during the process before the final decision was made.
> Do a research on the net.

I can't find anything.  Care to give me a pointer or two?

I found http://www.statoids.com/w3166his.html, which says:

1993-07-28 (Newsletter III-45): 
           Numeric code of Yugoslavia changed from 890 to 891 (a
           consequence of the splitting off of Bosnia and Herzegovina, 
           Croatia, Macedonia, and Slovenia).

This only indicates that "YU" code should have been changed as well,
because it's value was diminished compared to numeric code.  I can't
find any reference to a complaint about keeping "YU" code as it is.

>> All applications now need to cope with another such change, and it's
>> even simpler in practice, since the timeframe between the change is
>> some 12 years (pre-1990 "CS" and 2003-now "CS").
>
> Still, since a simple "CS" you encounter doesn't say which year it
> originated from, there is no clean way to differentiate between those
> two. You could only _assume_ that the data is not older than 15 years..
> may be valid for an office application, but not for mainframe database
> applications where data can be much older.

Yeah, it's exactly the same problem.  Since a simple "YU" you
encounter doesn't say which year it originated from, there is no clean
way to differentiate between pre-1991 YU and later-YU. 

ISO 3166 was not designed for the way it was used (basically, it
should have always been used with a date tag next to code), and it's
not a problem in ISO 3166 as it is, but that users need a different
(better) standard: one that doesn't assign codes valid only in certain
time ranges.

Maybe ISO 3166 can evolve into such a standard, but maybe we'd need
another one.  (i.e. if I remember correctly, RFC3066bis is going away
from ISO 3166 with a clause like "if a code has been re-assigned in
ISO 3166, don't use that in RFC3066")

>> Of course, nobody bothered with the former change, because nobody
>> seemed to care, but suddenly it's important for those data
>> applications?  It's all hipocrisy to me.
>
> Software evolved much, especially in the last 10-15 years. Some years
> ago there weren't many applications that were internationalized to at
> least some degree. Only a few could handle data from different regional
> origin, and even fewer did it by using ISO-3166-1 alpha-2 codes, some
> did by using ISO-3166-1 numeric-3 codes that developed later
> (interestingly in this case the number was reassigned, previously 891
> was Yugoslavia and now is Serbia and Montenegro).

The number should stay the same NOW: i.e. 2003 Yugoslavia was the same
country as 2003 Serbia and Montenegro, but only with a different name;
1990 Yugoslavia was much different from 1992 Yugoslavia, and that's
when both the number AND the code should have been changed if you're
not being hypocritical.  That's another design flaw of ISO 3166:
country codes should have not been based on names, but rather, they
should have been served on a first-come first-served (or FIFO ;-)
basis (most countries would get their based on names, but it would not
be a requirement).

I'm bringing this example up only because I'm quite familiar with it
(I know how the country changed, yet the code didn't), and because
it illustrates the same problems of ISO 3166 which were present for a
long time.  I actually had applications where simply using ISO 3166
codes was insufficient (eg. for indicating country where book was
published in a library software; book could not have been published in
1985/hr, but instead in 1985/yu).

>> For those that don't know, ISO 3166 has *FOREVER* been defined as
>> the following:
>>  - two letter country code resembling country name
>>  - may be changed in accordance with the country name change
>>  - may be reused
>
> They may not be reused for 5 years. It still is short-sighted to reuse
> them after 5 years.
>
>> Now, everybody's broken applications which didn't follow the specs
>
> Which specs?

The above specs: they may be reused after 5 years have passed.

>> are ISO 3166's fault?
>
> In this case IMHO it is. They don't even follow their own goal. Citing
> from the bottom of
> http://www.iso.org/iso/en/prods-services/iso3166ma/04background-on-iso-3166/iso3166-past-present-and-future.html
>
> | With the growing integration of the Internet into many aspects of
> | (business-)life the need for coded information related to geographical
> | concepts like country or place names will continue to rise. ISO 3166-1
> | is going to be one of the ISO standards which help facilitate this
> | integration process - today and tomorrow.
>
> With reused codes there is no tomorrow.

This is open to interpretation.  I think there is tommorow even
without the codes, computers, Internet and whatever.  I guess I'm a
bit more optimistic about human kind's ability to survive :)

ISO 3166-1 is a perfectly suitable standard, if you follow everything
it says: you put a date along with any code use (this is implicit
through "reuse" clause).  Now, it turned out not to be what everybody
thought it was (because of reading it and "hoping" that what it says
is not really true).

>> > | - Almost all already existing software only knows about YU, not CS. Not
>> > |   using YU in the file format would result in unrecognized locales in
>> > |   interoperability.
>> 
>> This has changed.  GNU libc in 2.3 and HEAD branches knows only about
>> sr_CS, and not about sr_YU. 
>
> Yes, but it changed only recently.

Not really.  sr_YU has been removed from GNU libc for a year already,
and sr_CS has been added just some short time ago.  The reasons for
this have nothing to do with "CS" code however, but are rather due to
strict and conservative policy of GNU libc maintainers (look at what
is Denis Barbier doing with separate locale repository for Debian for
the very same reasons).

> I won't talk about politics here right now, but the entire story why it
> took so long and why in the end they used CS and not something else IMHO
> was a political decision, maybe because Serbia And Montenegro insisted
> on using nothing else or whatever, I don't know. There is no real other
> reason why it had to be CS.

ISO 3166 defines that alpha-2 codes should resemble country name.
"CS" indeed resembles it if you turn it around a bit, and there was no
free code starting with "S".

> For OOo in the end it was simply a technical decision: back in 2003/2004
> the situation was a complete mess, and no library supported CS. This
> changed, and one of the next OOo versions will support sr_CS.

Nice to hear that (though I can't see why we had to go into this
discussion then?).

>> As I said, the problem is that everybody was expecting to use ISO 3166
>> in a different way than it was defined.  And there is NO chance of
>> collision in OpenOffice.org: there was never sr-CS meaning Serbian in
>> Czechoslovakia, so you need not worry.
>
> That's not the point. I could even come and define and use en-CS if
> I wanted to have locale data for English language for use in Serbia And
> Montenegro.

Indeed.  But nobody did define "Serbian in Czechoslovakia", so in
PRACTICE, this is a non-issue.  It's an issue only in theory.  And a
theory which should have materialised in the PAST (it's not going to
happen in the future), so we know that it hasn't happened.

Or are you planning on defining Serbian in Czechoslovakia just to make
this assumption break? :)

>> > Of course we can consider re-evaluation of how OOo will handle this, but
>> > in this special case it is not simply "follow the standard".
>> 
>> If you're into exceptions and special cases, then just use "SCG".
>
> Using alpha-3 code wouldn't comply with RFC 3066.

Yeah.  There was some talk about using them (exactly because of "CS"
case, since they're not going to use it to represent Serbia and
Montenegro, if I remember correctly) in new releases, but that might
not happen after all (the last I heard from a discussion list was that
numeric code was to be used instead).

>> At the same time, I'd recommend removing sh-* stuff, because it's a
>> hack.
>
> Yes, it's a hack. It's a hack because ...
>
>> For GNU libc, I have [EMAIL PROTECTED] locales ("Latn" is ISO 15924 script
>> identifier).
>
> ... script identifiers aren't supported.

Aren't they? ICU seems to use them, and maybe I'm mistaking RFC 3066
with... 

> Looking forward for new challenges that will surely arrive with RFC3066bis
> http://www.inter-locale.com/ID/why-rfc3066bis.html

where I'm sure ISO 15924 script identifiers ARE supported (it's easy
to define them, since any alpha-4 key in a language tag positioned
right after language key is a script identifier).  Or I'm completely
confusing this with "tags for identification of languages draft":

http://www.simpleweb.org/ietf/internetdrafts/complete/draft-phillips-langtags-03.txt

Cheers,
Danilo

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [l10n-dev] sr-CS/sr-YU locale data

Reply via email to