jenkins-bot has submitted this change and it was merged. Change subject: icu_tokenizer: add a default set of language codes ......................................................................
icu_tokenizer: add a default set of language codes Adds the list of languages identified in https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Spaceless_Writing_Systems_and_Wiki-Projects Change-Id: I2e3a13b12cb5cf9ebc3d7c98486c346e03acab2a --- M includes/Maintenance/AnalysisConfigBuilder.php 1 file changed, 24 insertions(+), 1 deletion(-) Approvals: Cindy-the-browser-test-bot: Looks good to me, but someone else must approve Tjones: Looks good to me, approved EBernhardson: Looks good to me, but someone else must approve jenkins-bot: Verified diff --git a/includes/Maintenance/AnalysisConfigBuilder.php b/includes/Maintenance/AnalysisConfigBuilder.php index 00aec8f..aade8d1 100644 --- a/includes/Maintenance/AnalysisConfigBuilder.php +++ b/includes/Maintenance/AnalysisConfigBuilder.php @@ -922,7 +922,30 @@ * @var bool[] indexed by language code, languages where ICU tokenization * can be enabled by default */ - private $languagesWithIcuTokenization = []; + private $languagesWithIcuTokenization = [ + "bo" => true, + "dz" => true, + "gan" => true, + "ja" => true, + "km" => true, + "lo" => true, + "my" => true, + "th" => true, + "wuu" => true, + "zh" => true, + "lzh" => true, // zh-classical + "zh-classical" => true, // deprecated code fo lzh + "yue" => true, // zh-yue + "zh-yue" => true, // deprecated code for yue + // This list below are languages that may use use mixed scripts + "bug" => true, + "cdo" => true, + "cr" => true, + "hak" => true, + "jv" => true, + "nan" => true, // zh-min-nan + "zh-min-nan" => true, // deprecated code for nan + ]; /** * @var array[] -- To view, visit https://gerrit.wikimedia.org/r/321393 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I2e3a13b12cb5cf9ebc3d7c98486c346e03acab2a Gerrit-PatchSet: 7 Gerrit-Project: mediawiki/extensions/CirrusSearch Gerrit-Branch: master Gerrit-Owner: DCausse <[email protected]> Gerrit-Reviewer: Cindy-the-browser-test-bot <[email protected]> Gerrit-Reviewer: DCausse <[email protected]> Gerrit-Reviewer: EBernhardson <[email protected]> Gerrit-Reviewer: Gehel <[email protected]> Gerrit-Reviewer: Manybubbles <[email protected]> Gerrit-Reviewer: Smalyshev <[email protected]> Gerrit-Reviewer: Tjones <[email protected]> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
