A notable reduction in libxul size (reduction of 544 KB on aarch64 Android) will be observable in the next Nightly if all goes well. I'm sending this so that people don't need to investigate where the libxul size change comes from.
The change comes from https://bugzilla.mozilla.org/show_bug.cgi?id=1793749 and https://bugzilla.mozilla.org/show_bug.cgi?id=1630920 both of which implement behaviors that Chrome already has. These changes also reflect the (upcoming) ICU4X defaults. (TL;DR ends here; long version follows) Bug 1630920 removes the zh-u-co-big5han and zh-u-co-gb2312 collations for consistency with Chrome. Chrome excludes these with the comment: "big5han and gb2312han collation do not make any sense and nobody uses them." Therefore, Web authors cannot rely on them being present cross-browser anyway. >From source code archeology, I infer that long ago ICU initially got its first Traditional Chinese and Simplified Chinese collations created in the same manner as the Japanese collation: The order of the legacy coded character sets. When more appropriate default collations were introduced, (by stroke count for Traditional Chinese and by Pinyin for Simplified Chinese), the collations that were already in ICU got renamed instead of getting removed. There is now an issue open to remove the legacy coded character set-based ones from CLDR: https://unicode-org.atlassian.net/browse/CLDR-16062 Bug 1793749 changes the _root_ collation to use implicit rather than explicit ordering for Han characters. This change is in principle a reduction in correctness, but Chrome is already shipping this reduction in correctness, so Web authors cannot rely on the more-correct-in-principle behavior across browsers anyway. Copypaste from the bug: ICU supports two variants of the root collation: unihan and implicithan. unihan puts all Han characters across blocks of different ages into unified radical-stroke order, which is theoretically proper but involves explicit data. implicithan explicitly orders the blocks (main ideograph block before Extension A, even though Extension A comes first in code point order) and then within each block implies the order from the codepoint (radical-stroke within each block), which is OK enough in practice and involves less data. The reason why implicithan is OK enough in practice is two-fold: 1. None of the CJK locales use the root order for common characters in the respective languages. They all use tailorings, so unihan vs. implicithan is relevant only for the purpose of giving *some* order to characters that are so rare that the language-specific tailoring doesn't cover them. 2. Since each block, including the main ideographic block, is internally ordered by radical-stroke, the difference is irrelevant to comparison of characters that are common enough to be covered by the main ideographic block. - - P.S. Chrome also excludes zh-u-co-unihan without excluding ja-u-co-unihan or ko-u-co-unihan. I have not aligned Firefox on this, because I'm not completely convinced about what's appropriate. So far, however, I am unaware of any app using *-u-co-unihan collation orders for anything other than building human-browsable lookup indexes for dictionaries. (As opposed to sorting search results.) The Web Platform does not currently provide an API for generating a bucketed index (for English, you'd have buckets for each letter from A to Z, for *-u-co-unihan, you'd have a bucket for each radical) and it's unclear if *-u-co-unihan index generation even works properly with the implicithan root due to the way a couple of the bucket reference points attach to characters from outside the main ideographic block. If you are curious about probing a given browser, you can use https://hsivonen.com/test/moz/zh-collations.html ("cjk" would be a more proper name than "zh", but by the time I added Japanese and Korean tests, there were already links to that URL out there.) -- Henri Sivonen [email protected] -- You received this message because you are subscribed to the Google Groups "[email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-platform/CAJHk%2B8Q69czO71VhktvcZGRNT85x0GaWk5fP5dnZkHp25BnYwQ%40mail.gmail.com.
