jenkins-bot has submitted this change and it was merged.

Change subject: icu_tokenizer: add a default set of language codes
......................................................................


icu_tokenizer: add a default set of language codes

Adds the list of languages identified in
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Spaceless_Writing_Systems_and_Wiki-Projects

Change-Id: I2e3a13b12cb5cf9ebc3d7c98486c346e03acab2a
---
M includes/Maintenance/AnalysisConfigBuilder.php
1 file changed, 24 insertions(+), 1 deletion(-)

Approvals:
  Cindy-the-browser-test-bot: Looks good to me, but someone else must approve
  Tjones: Looks good to me, approved
  EBernhardson: Looks good to me, but someone else must approve
  jenkins-bot: Verified



diff --git a/includes/Maintenance/AnalysisConfigBuilder.php 
b/includes/Maintenance/AnalysisConfigBuilder.php
index 00aec8f..aade8d1 100644
--- a/includes/Maintenance/AnalysisConfigBuilder.php
+++ b/includes/Maintenance/AnalysisConfigBuilder.php
@@ -922,7 +922,30 @@
         * @var bool[] indexed by language code, languages where ICU 
tokenization
         * can be enabled by default
         */
-       private $languagesWithIcuTokenization = [];
+       private $languagesWithIcuTokenization = [
+               "bo" => true,
+               "dz" => true,
+               "gan" => true,
+               "ja" => true,
+               "km" => true,
+               "lo" => true,
+               "my" => true,
+               "th" => true,
+               "wuu" => true,
+               "zh" => true,
+               "lzh" => true, // zh-classical
+               "zh-classical" => true, // deprecated code fo lzh
+               "yue" => true, // zh-yue
+               "zh-yue" => true, // deprecated code for yue
+               // This list below are languages that may use use mixed scripts
+               "bug" => true,
+               "cdo" => true,
+               "cr" => true,
+               "hak" => true,
+               "jv" => true,
+               "nan" => true, // zh-min-nan
+               "zh-min-nan" => true, // deprecated code for nan
+       ];
 
        /**
         * @var array[]

-- 
To view, visit https://gerrit.wikimedia.org/r/321393
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2e3a13b12cb5cf9ebc3d7c98486c346e03acab2a
Gerrit-PatchSet: 7
Gerrit-Project: mediawiki/extensions/CirrusSearch
Gerrit-Branch: master
Gerrit-Owner: DCausse <[email protected]>
Gerrit-Reviewer: Cindy-the-browser-test-bot <[email protected]>
Gerrit-Reviewer: DCausse <[email protected]>
Gerrit-Reviewer: EBernhardson <[email protected]>
Gerrit-Reviewer: Gehel <[email protected]>
Gerrit-Reviewer: Manybubbles <[email protected]>
Gerrit-Reviewer: Smalyshev <[email protected]>
Gerrit-Reviewer: Tjones <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to