https://bugzilla.wikimedia.org/show_bug.cgi?id=164

--- Comment #196 from Philippe Verdy <verd...@wanadoo.fr> 2010-07-26 20:37:46 
UTC ---
Note that the CollatorFactory may fail to locate the specified locale for which
a collator is being requested. Additionally, several locales may share exactly
the same collator.

In all cases, the collatorFactory will return a valid Collator object, whose
locale can be identified: the Collator object returned by:

$collator = $wgCollatorFactory->get($locale, $level);

will have a property that will contain the effective locale code (normalized)
and an other property containing the collation level from which it was
effectively built. You should be able to access it simply with something like:

$effectiveLocale = $collator->locale();
$effectiveLevel = $collator->level();

after just getting the collator instance from the factory.

This may be useful to avoid storing duplicate equivalent binary sortkeys, or
simply to determine which effective locale to use in SQL select queries (to
retrieve the sorted list of pagenames ordered by a specified locale), when the
SQL schema will be able to store several sortkeys for the same page in the same
category.

The factory will also instanciate a collator with an effective locale and an
effective collation level only once, caching it in an internal array, for
repeated use.

This will save the complex preparation of tables, and will avoid building
tables for all supported languages (for example in Commons where lots of
languages may be desirable, weahc one with possibly several sort options, or
supported conversions to other scripts or script variants).

The factory however should probably be able to load the DUCET table associated
to the CLDR "root" locale completely and immediately when it is first
instanciated and stored in the global variable (there's probably no need to
test this each time vecaue of lazy initializations with null member fields);
and it should most probably build the default collator (for
$locale=$wgContentLanguage, and $collationlevel=1) immediately, storing it in
the first position of its member array of already prepared Collator instances.

But you may think the opposite, in order to speedup the server startup by some
(milli-)seconds or reduce the initial CPU/memory stress in the garbage collator
of PHP. However I'm not convince that the server will be ready faster, and the
extra tests that will be performed at each use of the $wgCollatorFactory->get()
method may impact the performance at runtime...

Note also the ICU uses the same approach of a CollatorFactory to build and
cache reusable Collator instances, because it's a proven good design pattern
for implementing and using collators.

A collator object may also be used to compare to texts without even generating
their sortkeys, or without mapping them, so it may help to include in the
Collator interface this method:

$collator->compare($text1, $text2);

that will return an integer (in other words, a Collator also implements the
Comparator interface), by parsing $text1 and $text2 collation element by
collation element up to the end at level 1, comparing their collation weights
only at theis level, before restarting with the next level. When the collator
was instanciated at level 1, the successive collation elements need not be
stored, but for higher levels, it helps if they are parsed only once and kept
in an indexed array that will allow faster lookup for the next levels in the
table of collation weights for these levels.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to