Re: [l10n-dev] Request for new native language communities

Eike Rathke Thu, 31 Aug 2006 17:03:55 -0700

Hi "Reshat,

On Wed, Aug 30, 2006 at 15:11:13 -1000, "Reshat Sabiq (Reşat)" wrote:


> > Note that currently modifiers or ISO 15924 script codes as drafted
> > in RFC 3066bis are not supported yet. Furthermore I didn't find any
> >  normative reference to 'iqte' regarding the Tatar languag.
> If the modifiers are expected to be supported in the future, then
> perhaps absence of their support now shouldn't prevent the initiation
> of the project, as it is guaranteed to take a while.

Of course not, and as it already was mentioned, creation of a native
language _site_ should normally be targeted to the language, independent
of the script it is written in.

The concerns I expressed are merely from a developer's point of view and
are affected by availability and support of _locales_, not just
languages, because that is what the code mostly deals with. There are
technical restrictions we currently have to live with until quite an
amount of implementation will be changed to support script values and
maybe variants and extensions in locale designators according to the
upcoming RFC 3066bis. This will most certainly not happen in the very
near future.

> As far as iqte, yes i realize that it is not standardized. I was
> hoping that practices would be similar to the "freeform" qualification
> in the following statement on mozilla.org:
> "In some rare cases, we might need the dialect part as a third part
> (3- to 8-letter basically freeform part), we currently can imagine two
> cases there: ..."
> http://wiki.mozilla.org/L10n:Simple_locale_names

This three-part notation language-region-dialect or
language-region-variant currently is not supported by OOo.

As a side note, not directly related to the Tatar script problems, I'm
also not convinced that the approach taken by Mozilla in the case of

| 1. there's no ISO 639.2 code for some language that wants to do
|    a localization | (e.g. for Venetian Firefox team). In this case, we
|    can use the generic | identifier for the language family (romance:
|    roa) from ISO 639.2 as the | language code, and add an identifier for
|    the specific language as the dialect | (if one exists, we prefer to
|    use the 3-letter SIL code). In the case of | venetian, we end up with
|    "roa-IT-vec" this way.

would actually be a good solution for document exchange, because of the
nature of a language _family_ designator or collective code, using that
instead of a real language designator could fool matching and fallback
mechanisms. Instead, for the rare cases where it would be necessary, I'd
go for ISO/DIS 639-3 where available. This would not strictly comply
with RFC 3066 because at the time it was written ISO/DIS 639-3 didn't
exist so it mentions only 639-1 and 639-2 (and RFCs are not based on
drafts like DIS anyway), but would follow the same pattern of
<language>-<region>. This is also what Ethnologue recommends nowadays,
see http://www.ethnologue.com/codes/default.asp#standards

RFC 3066bis then would allow for scripts, variants and extensions, and
a subsequent RFC 3066ter will hopefully adopt the then ISO 639-3 codes
for primary language tags.


> > If RFC 3066bis was implemented, Tatar written in Latin script would
> > be 'tt-Latn', this would be independent of a region code. However,
> > to form a proper locale a region code is needed, with a few
> > exceptions we agreed on, Esperanto or Interlingua for example.
> Unfortunately, it's not so simple for Idil-Ural (Qazan) Tatar.  There
> are currently about 6 different Latin alphabets in use,

oerghs..

> [...]
> I'm a believer in IQTElif because it is the only alphabet that
> provides similar orthography w/ Crimean Tatar (crh). Given the
> plethora of Latin alphabets, the iqte modifier was to make clear what
> alphabet is used.

All your points taken, just that it is not possible to use with our
current implementation.


> P.P.S. If we made a team for tt alone, w/ no modifiers, frankly i
> wouldn't know what it would represent. Would it be Cyrillic, or Latin?

What is the (a) most widely used script and (b) official form? The
official one is Cyrillic, as far as I have understood the matter, so
probably efforts should go into that direction.

> P.P.P.S. Thanks to Crimean Tatars and Ukraine, crh has none of these
> questions, and has one decent Latin alphabet. I guess we could start
> by making a team for crh, and i look forward to feedback on the rest.

At least 'crh' is a valid ISO 639-2 code..

  Eike

-- 
 PGP/OpenPGP/GnuPG encrypted mail preferred in all private communication.
 Key ID: 0x293C05FD - 997A 4C60 CE41 0149 0DB3  9E96 2F1A D073 293C 05FD

signature.asc
Description: Digital signature

Re: [l10n-dev] Request for new native language communities

Reply via email to