OliverKeyes has submitted this change and it was merged. Change subject: Google detection bug fix ......................................................................
Google detection bug fix - We noticed that requests referred from http://google.com were not being detected as coming from Google. This patch aims to fix that. Change-Id: Iab14d7398031f447e2aa8305a36895bfd17d141c --- M refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngine.java M refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngineClassifier.java M refinery-core/src/test/resources/referer_test_data.csv 3 files changed, 3 insertions(+), 2 deletions(-) Approvals: OliverKeyes: Verified; Looks good to me, approved diff --git a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngine.java b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngine.java index 2339abf..60f121e 100644 --- a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngine.java +++ b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngine.java @@ -25,7 +25,7 @@ * to specify a string with spaces and symbols if needed. */ public enum SearchEngine { - GOOGLE("Google", "\\.google\\."), + GOOGLE("Google", "\\.?google\\."), YAHOO("Yahoo", "search\\.yahoo\\."), BING("Bing", "\\.bing\\."), YANDEX("Yandex", "yandex\\."), diff --git a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngineClassifier.java b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngineClassifier.java index 8fe4bb8..c4a4b7a 100644 --- a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngineClassifier.java +++ b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/SearchEngineClassifier.java @@ -38,7 +38,7 @@ /* * A simple pattern for search identification */ - private static final Pattern searchPattern = Pattern.compile("(\\.(baidu|bing|google)|search\\.yahoo|yandex)\\."); + private static final Pattern searchPattern = Pattern.compile("(\\.?(baidu|bing|google)|search\\.yahoo|yandex)\\."); /** * Crudely subsets a referer to just contain the domain, diff --git a/refinery-core/src/test/resources/referer_test_data.csv b/refinery-core/src/test/resources/referer_test_data.csv index c6de7b8..8e88ffa 100644 --- a/refinery-core/src/test/resources/referer_test_data.csv +++ b/refinery-core/src/test/resources/referer_test_data.csv @@ -2,6 +2,7 @@ Random internal link,https://zh.wikipedia.org/zh-tw/%E6%96%B9%E6%9D%B1%E6%98%87,internal,false,none Nada,-,none,false,none Google,https://www.google.co.id/,external (search engine),true,Google +Google,http://google.com/,external (search engine),true,Google Yahoo,http://search.yahoo.co.jp/search?fr=slv1-necpc9&p=%E4%B8%89%E5%8F%88&ei=UTF-8,external (search engine),true,Yahoo Random external link,http://www.cowboom.com/product/1617241,external,false,none Bing,http://www.bing.com/search?q=Svengali+movie+1931&filters=ufn,external (search engine),true,Bing -- To view, visit https://gerrit.wikimedia.org/r/277679 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: Iab14d7398031f447e2aa8305a36895bfd17d141c Gerrit-PatchSet: 2 Gerrit-Project: analytics/refinery/source Gerrit-Branch: master Gerrit-Owner: Bearloga <[email protected]> Gerrit-Reviewer: Bearloga <[email protected]> Gerrit-Reviewer: Nuria <[email protected]> Gerrit-Reviewer: OliverKeyes <[email protected]> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
