Our (three) Locale classes, i.e. rtl::Locale, com::sun::star::lang::Locale and comphelper::Locale all have three fields, i.e.
OUString Language OUString Country OUString Variant where Language is typically a ISO-639 [*] code, Country is typically in ISO-3166 and Variant is basically an undefined bucket with API documentation copied and pasted from the Java Locale class. We don't have a good way to describe various languages with just Language and Country, typically those that can be written in different scripts. For some cases we're already kluding things a bit with e.g. sh-RS to describe "Serbian using the Latin script in Serbia" (as opposed to sr-RS to describe Serbian written in the default Cyrillic script in Serbian) and in some cases we don't have a way to describe languages we could support if we had a way to transport that info around e.g. Inuktitut Syllabics Canada (as opposed to iu-CA to describe that language written in the default Latin alphabet which we could currently support) In BCP-47 the examples above could be described as sr-Latn-RS and iu-Cans-CA (http://www.rfc-editor.org/rfc/bcp/bcp47.txt) where basically if the second tag exists (of an arbitrary number of tags) and is four letters long it must be a ISO 15924 script code. i.e. current sr-RS and iu-CA remain valid BCP-47 strings. In parallel with all this the glibc locale string is typically language_country.encod...@modifier_string, where language and Country are as above, while the modifier strings are typically "cyrillic" and "latin" for those variants among other more arbitrary ones. e.g. "sr_rs.ut...@latin" for what would be in BCP-47 style "sr-Latn-RS" I note here that the rtl Locale parser converts a Unix locale string into a rtl::Locale along the lines of Language = language, e.g. sr Country = Country, e.g. RS Variant = all_the_rest_of_the_string, e.g. .ut...@latin and depends on being able to reverse this conversion back to the the original Unix locale string, i.e. it needs to rebuild sr_rs.ut...@latin from its rtl::Locale structure and the comphelper one does something similar, while the com::sun::star::lang::Locale.Variant seems to be pretty much unused throughout OOo. In our xml format afaics, where the com::sun::star::lang::Locale is basically the structure that backs it, we have just "language" and "country" tags. So..., how about we adopt a BCP-47 based approach. i.e. a) Where we are currently describing locales as a string in "iso-format" we use BCP-47. Currently valid locale strings get to remain valid. b) Where we use a Locale structure, Language and Country stay the same, but we specify a format for the remaining Variant field where it is BCP-based sequence of tags separated by '-'. The Variant field becomes the equivalent BCP-47 locale string for the totality, minus the language and region tags, plus that the first tag entry *must* be a Script Code to ensure forward and backward conversion to an unambiguous BCP-47 string. In this scheme the script tag at the start of the Variant can (and must) be empty to denote the default script. c) Where we use "language" and "country" codes in our xml format we add a "language-tags" attribute which maps directly to that Variant field. i.e. sr-Latn-RS becomes Language = sr Country = RS Variant = Latn i.e. sr-Latn-RS-whatever-foo becomes Language = sr Country = RS Variant = Latn-whatever-foo a BCP-47 string of de-DE-1901 becomes Language = de Country = DE Variant = -1901 de-DE remains Language = de Country = DE Variant = Parsers that want to convert a Unix Locale into the above structure can take, e.g. aa_er.ut...@saaho and make it into Language = aa Country = ER Variant = -.ut...@saaho to give a reversible scheme where the original Unix Locale string can be reconstructed, and for Unix Locale strings which hint at the script in use, we can parse sr_rs.ut...@latin into Language = sr Country = RS Variant = latn-.ut...@latin and remain reversible into the original Unix Locale string, and also provide a non-null script tag which allows continued conversion from the rtl::Locale class to the com::sun::star::lang::Locale one without losing script tag information. The xml format for a style that sets the Language of a paragraph to Inuktitut Syllabics Canada could then use an additional language-tags attribute, e.g. <style:text-properties fo:language="iu" fo:country="CA" fo:language-tags="Cans"/> while the "Locales" string of the spellchecker Locales string can use BCP-47 format, e.g. support "iu-Cans-CA" C. 1. See #i111019# for the sal issue to parse three letter codes there, and not limit it to two codes. 2. Presumably it would be best to prefer *generating* sh-RS for backwards compatibility, even though accepting sr-Latn-RS 3. comphelper::Locale is very little used, it looks like a good idea to move uses of it over to com::sun::star::lang::Locale and convert it to some calls that operate on that instead and/or merge the unused bits over to e.g. MSLangId. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@openoffice.org For additional commands, e-mail: dev-h...@openoffice.org