OliverKeyes has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/187227

Change subject: Include requests with 304 status codes
......................................................................

Include requests with 304 status codes

As pointed out by Bob West (thanks Bob!) 304s are perfectly valid
pageviews - they're just for content that is locally cached
and has not changed server-side since it was cached. We should
include those pageviews.

This patch does so, and also expands the test resources to
include an example with a 304 HTTP status, to prevent regressions.

Change-Id: Ic786037360eeae33726ffea3293d2c4dbb96f10a
---
M 
refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
M refinery-core/src/test/resources/pageview_test_data.csv
2 files changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/analytics/refinery/source 
refs/changes/27/187227/1

diff --git 
a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
 
b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
index e52777d..c3cd54a 100644
--- 
a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
+++ 
b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
@@ -62,6 +62,10 @@
         "text/html; charset=UTF-8"
     ));
 
+    private static final HashSet<String> httpStatusesSet = new 
HashSet<String>(Arrays.asList(
+        "200",
+        "304"
+    ));
 
     /**
      * All API request uriPaths will contain this
@@ -154,8 +158,8 @@
         uriHost = uriHost.toLowerCase();
 
         return (
-            // All pageviews have 200 HTTP status
-            httpStatus.equals("200")
+            // All pageviews have a 200 or 304 HTTP status
+            httpStatusesSet.contains(httpStatus)
             // check for a regular pageview contentType, or a an API 
contentType
             &&  (
                     (contentTypesSet.contains(contentType) && 
!stringContains(uriPath, uriPathAPI))
diff --git a/refinery-core/src/test/resources/pageview_test_data.csv 
b/refinery-core/src/test/resources/pageview_test_data.csv
index ff7ae7f..e71364a 100644
--- a/refinery-core/src/test/resources/pageview_test_data.csv
+++ b/refinery-core/src/test/resources/pageview_test_data.csv
@@ -1,5 +1,6 @@
 test_description, is_pageview,is_legacy_pageview,ip_address,x_forwarded_for, 
uri_host, uri_path, uri_query, http_status, content_type, user_agent
 Is Pageview - Desktop, true,true,174.62.175.82,-,en.wikipedia.org, 
/wiki/Horseshoe_crab,-,200,text/html, turnip
+Is Pageview – Desktop – locally cached content, 
true,true,174.62.175.82,-,en.wikipedia.org, 
/wiki/Horseshoe_crab,-,304,text/html, turnip
 Is Pageview - App, true,false,174.62.175.83,-,en.wikipedia.org, /w/api.php, 
?action=mobileview&sections=0,200, application/json, WikipediaApp/1.2.3
 Is Pageview – Mobile Web, 
true,true,174.62.175.84,-,en.m.wikipedia.org,/wiki/Bernard_Manning,-,200,text/html,rutabaga
 Is Pageview – Desktop - Serbian sr-ec, 
true,false,174.62.175.85,-,sr.wikipedia.org,/sr-ec/Историја_Срба_пре_Немањића,-,200,text/html,Three-finger
 salute

-- 
To view, visit https://gerrit.wikimedia.org/r/187227
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic786037360eeae33726ffea3293d2c4dbb96f10a
Gerrit-PatchSet: 1
Gerrit-Project: analytics/refinery/source
Gerrit-Branch: master
Gerrit-Owner: OliverKeyes <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to