jenkins-bot has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/350247 )

Change subject: Support Wiki Abbreviation for Czech (cs vs cz)
......................................................................


Support Wiki Abbreviation for Czech (cs vs cz)

Wikis use cs as the abbreviation for Czech. Lucene uses cz. Support
both here (1) so that extracting the language code from the wiki name
as "cs"  will work and (2) since cz isn't used for anything else.

Change-Id: I52b5375010fd81730d5e835b4b2accfe93a51517
---
M refinery-core/src/test/resources/stemmer_test_data.csv
M 
refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
2 files changed, 4 insertions(+), 0 deletions(-)

Approvals:
  EBernhardson: Looks good to me, approved
  jenkins-bot: Verified
  DCausse: Looks good to me, but someone else must approve



diff --git a/refinery-core/src/test/resources/stemmer_test_data.csv 
b/refinery-core/src/test/resources/stemmer_test_data.csv
index 758d530..6e79a80 100644
--- a/refinery-core/src/test/resources/stemmer_test_data.csv
+++ b/refinery-core/src/test/resources/stemmer_test_data.csv
@@ -6,5 +6,6 @@
 Testando o braço brasileiro,br,test brac brasileir
 provar el català derivats,ca,prov catal deriv
 Testování české vyplývající,cz,testován česk vyplývajík
+Testování české vyplývající,cs,testován česk vyplývajík
 afprøvning af dansk hidrører,da,afprøvning dansk hidrør
 foobar,foo,foobar
diff --git 
a/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
 
b/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
index 5bc8e5c..9ff1c42 100644
--- 
a/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
+++ 
b/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
@@ -125,6 +125,9 @@
         analyzersCache.put("bg", new BulgarianAnalyzer());
         analyzersCache.put("br", new BrazilianAnalyzer());
         analyzersCache.put("ca", new CatalanAnalyzer());
+        // wikis use cs for Czech, Lucene uses cz;
+        // support both since cz is not used for anything else
+        analyzersCache.put("cs", new CzechAnalyzer());
         analyzersCache.put("cz", new CzechAnalyzer());
         analyzersCache.put("da", new DanishAnalyzer());
         analyzersCache.put("de", new GermanAnalyzer());

-- 
To view, visit https://gerrit.wikimedia.org/r/350247
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I52b5375010fd81730d5e835b4b2accfe93a51517
Gerrit-PatchSet: 1
Gerrit-Project: analytics/refinery/source
Gerrit-Branch: master
Gerrit-Owner: Tjones <[email protected]>
Gerrit-Reviewer: DCausse <[email protected]>
Gerrit-Reviewer: EBernhardson <[email protected]>
Gerrit-Reviewer: Joal <[email protected]>
Gerrit-Reviewer: Nuria <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to