HUZZAH! > On 13 Jun 2017, at 07:09, Amir E. Aharoni <[email protected]> > wrote: > > Hi, > > Another edition of i18n software news! > > Yesterday, a change was deployed in the Bashkir Wikipedia: The categories are > now sorted in the correct alphabetical order. > > Bashkir, like many languages of the Soviet Union, uses the Cyrillic alphabet > with several extra characters. Without proper software support, the extra > letters are sorted according according to their Unicode character number > order, which is not very useful. For example, the letter Ө is supposed to be > in the middle of the alphabet between О and П, but without correct collation > it's in the end, so Ufa (Өфө), the capital of Bashkortostan, appears in the > very end of the alphabet in the "Capitals of Russian regions" category [1] , > but now it appears correctly before П. > > This could be resolved by adding the collation for this language to CLDR and > ICU, and I filed a ticket about this with CLDR [2]. Actually getting it added > and deployed is a long process, but the MediaWiki developer Brian Wolff > provided a good interim solution in MediaWiki code itself. The infrastructure > code around it is surprisingly tricky, but to simply add a new alphabet, you > just need to create a file like this: > https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/collation/BashkirUppercaseCollation.php > > When it is added to CLDR and ICU, this stopgap solution can be removed from > MediaWiki. > > As far as I can see, Bashkir is the first language for which such a > comprehensive solution was made inside MediaWiki, and it is needed for many > others. I'll start looking for other languages where this is needed. My > process would be something like this: > 1. Find a languages in which there is a Wikipedia with incorrect collation. > 2. Find the correct alphabetical order, using a grammar book or a dictionary, > and confirm it with editors in that language. > 3. Submit a ticket to CLDR. > 4. Add a file with an alphabet, like the Bashkir file above, to MediaWiki > core. > 5. Get it reviewed, merged, and deployed. > 6. Deploy the change to the projects in that language. > 7. Run a script that converts the categories to the new collation. > > (Steps 5 and 6 sound repetitive because it needs to explicitly enabled for > each wiki. I filed another bug [4], which suggests defining a default > collation per language, so that step 6 won't be needed.) > > If anybody has better suggestions about working with CLDR and ICU and getting > them to add and release these collation files faster, I'll be very happy to > hear them. > > [1] http://bit.ly/2sWLJaX > [2] http://unicode.org/cldr/trac/ticket/10195 > [3] For the confirmation about Bashkir see > https://phabricator.wikimedia.org/T162823 . > [4] https://phabricator.wikimedia.org/T164985 > > -- > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי > http://aharoni.wordpress.com > “We're living in pieces, > I want to live in peace.” – T. Moore > _______________________________________________ > Langcom mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/langcom
_______________________________________________ Langcom mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/langcom
