Tjones has uploaded a new change for review. (
https://gerrit.wikimedia.org/r/350247 )
Change subject: Support Wiki Abbreviation for Czech (cs vs cz)
......................................................................
Support Wiki Abbreviation for Czech (cs vs cz)
Wikis use cs as the abbreviation for Czech. Lucene uses cz. Support
both here (1) so that extracting the language code from the wiki name
as "cs" will work and (2) since cz isn't used for anything else.
Change-Id: I52b5375010fd81730d5e835b4b2accfe93a51517
---
M refinery-core/src/test/resources/stemmer_test_data.csv
M
refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
2 files changed, 4 insertions(+), 0 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/analytics/refinery/source
refs/changes/47/350247/1
diff --git a/refinery-core/src/test/resources/stemmer_test_data.csv
b/refinery-core/src/test/resources/stemmer_test_data.csv
index 758d530..6e79a80 100644
--- a/refinery-core/src/test/resources/stemmer_test_data.csv
+++ b/refinery-core/src/test/resources/stemmer_test_data.csv
@@ -6,5 +6,6 @@
Testando o braço brasileiro,br,test brac brasileir
provar el català derivats,ca,prov catal deriv
Testování české vyplývající,cz,testován česk vyplývajík
+Testování české vyplývající,cs,testován česk vyplývajík
afprøvning af dansk hidrører,da,afprøvning dansk hidrør
foobar,foo,foobar
diff --git
a/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
b/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
index 5bc8e5c..9ff1c42 100644
---
a/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
+++
b/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java
@@ -125,6 +125,9 @@
analyzersCache.put("bg", new BulgarianAnalyzer());
analyzersCache.put("br", new BrazilianAnalyzer());
analyzersCache.put("ca", new CatalanAnalyzer());
+ // wikis use cs for Czech, Lucene uses cz;
+ // support both since cz is not used for anything else
+ analyzersCache.put("cs", new CzechAnalyzer());
analyzersCache.put("cz", new CzechAnalyzer());
analyzersCache.put("da", new DanishAnalyzer());
analyzersCache.put("de", new GermanAnalyzer());
--
To view, visit https://gerrit.wikimedia.org/r/350247
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I52b5375010fd81730d5e835b4b2accfe93a51517
Gerrit-PatchSet: 1
Gerrit-Project: analytics/refinery/source
Gerrit-Branch: master
Gerrit-Owner: Tjones <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits