On Sat, Mar 16, 2013 at 5:51 AM, Andrea Pescetti <[email protected]> wrote: > janI wrote: >> >> I have the following codes (directories): >> af brx dz eu he ka ky my om ro ... >> >> Where can I find the relation between the directory names and the >> languages (human names), someone (I think andrea) mentioned it was country >> codes ? > > > We don't use country codes, we rely on the LANGUAGE codes, which are ISO > standards. So, in general: > - if it is a two-letter code, look it up in ISO 639-1: > http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes ("af" -> "Afrikaans") > - if it is a three-letter code, use ISO 639-2 or (more complete, extends > 639-2) 639-3: http://en.wikipedia.org/wiki/List_of_ISO_639-3_codes ("pap" -> > "Papiamento") > > >> I expected dialects within a language to be written as e.g. es_XX, and I >> know there is an ongoing effort on translating to >> Catalan Euskadi and Gallego > > > No, this would be a dangerous approach! There is a lot of "political > correctness" at work here. Everything that is in ISO is a language. So all > languages spoken in Spain have equal dignity and their own codes. Catalan is > "ca", Basque/Euskadi is "eu", Gallego is "gl" and you listed all three of > them. > > >> I am also a bit puzzled about pt_BR and ca_XV > > > These are extensions made to accommodate language variants. Languages in the > form '[a-z]*_[A-Z]*' are an internal convention to be read as: > language_PLACE. So en_US means "English, as spoken in the US"; en_GB = > "English, as spoken in Great Britain"; pt_BR = "Portoguese, as spoken in > Brazil"; ca_XV = "Catalan, as spoken in Valencia [or Comunidad Valenciana]". > zh_CN and zh_TW are often called "simplified" and "traditional" Chinese, > instead of being linked to China and Taiwan as the two codes would mean. >
Do you know why we don't just follow the IETF's recommendations in this area? They have a similar scheme, BCP 47, but use a hyphen rather than underscore, e.g., en-US, pt-BR. This is what is used on the web in general, e.g., in HTTP headers. See: http://www.rfc-editor.org/bcp/bcp47.txt The even take it a step further, which might be useful in some cases. For example: sr-Latn-RS means Serbian language written in Latin script, as used in Serbia. -Rob > Regards, > Andrea. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
