OliverKeyes has uploaded a new change for review.
https://gerrit.wikimedia.org/r/187227
Change subject: Include requests with 304 status codes
......................................................................
Include requests with 304 status codes
As pointed out by Bob West (thanks Bob!) 304s are perfectly valid
pageviews - they're just for content that is locally cached
and has not changed server-side since it was cached. We should
include those pageviews.
This patch does so, and also expands the test resources to
include an example with a 304 HTTP status, to prevent regressions.
Change-Id: Ic786037360eeae33726ffea3293d2c4dbb96f10a
---
M
refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
M refinery-core/src/test/resources/pageview_test_data.csv
2 files changed, 7 insertions(+), 2 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/analytics/refinery/source
refs/changes/27/187227/1
diff --git
a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
index e52777d..c3cd54a 100644
---
a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
+++
b/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java
@@ -62,6 +62,10 @@
"text/html; charset=UTF-8"
));
+ private static final HashSet<String> httpStatusesSet = new
HashSet<String>(Arrays.asList(
+ "200",
+ "304"
+ ));
/**
* All API request uriPaths will contain this
@@ -154,8 +158,8 @@
uriHost = uriHost.toLowerCase();
return (
- // All pageviews have 200 HTTP status
- httpStatus.equals("200")
+ // All pageviews have a 200 or 304 HTTP status
+ httpStatusesSet.contains(httpStatus)
// check for a regular pageview contentType, or a an API
contentType
&& (
(contentTypesSet.contains(contentType) &&
!stringContains(uriPath, uriPathAPI))
diff --git a/refinery-core/src/test/resources/pageview_test_data.csv
b/refinery-core/src/test/resources/pageview_test_data.csv
index ff7ae7f..e71364a 100644
--- a/refinery-core/src/test/resources/pageview_test_data.csv
+++ b/refinery-core/src/test/resources/pageview_test_data.csv
@@ -1,5 +1,6 @@
test_description, is_pageview,is_legacy_pageview,ip_address,x_forwarded_for,
uri_host, uri_path, uri_query, http_status, content_type, user_agent
Is Pageview - Desktop, true,true,174.62.175.82,-,en.wikipedia.org,
/wiki/Horseshoe_crab,-,200,text/html, turnip
+Is Pageview – Desktop – locally cached content,
true,true,174.62.175.82,-,en.wikipedia.org,
/wiki/Horseshoe_crab,-,304,text/html, turnip
Is Pageview - App, true,false,174.62.175.83,-,en.wikipedia.org, /w/api.php,
?action=mobileview§ions=0,200, application/json, WikipediaApp/1.2.3
Is Pageview – Mobile Web,
true,true,174.62.175.84,-,en.m.wikipedia.org,/wiki/Bernard_Manning,-,200,text/html,rutabaga
Is Pageview – Desktop - Serbian sr-ec,
true,false,174.62.175.85,-,sr.wikipedia.org,/sr-ec/Историја_Срба_пре_Немањића,-,200,text/html,Three-finger
salute
--
To view, visit https://gerrit.wikimedia.org/r/187227
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic786037360eeae33726ffea3293d2c4dbb96f10a
Gerrit-PatchSet: 1
Gerrit-Project: analytics/refinery/source
Gerrit-Branch: master
Gerrit-Owner: OliverKeyes <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits