ArielGlenn has submitted this change and it was merged.

Change subject: datasets: update pagecounts-ez index html
......................................................................


datasets: update pagecounts-ez index html

incorporate changes from erikz, cleanup format for readability
fix name in 'maintained by puppet' notice
Change-Id: I93a2fcf81efca9884c2da2007f1d0e9a02813d3a
---
M modules/dataset/files/html/pagecounts-ez_index.html
1 file changed, 76 insertions(+), 49 deletions(-)

Approvals:
  ArielGlenn: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/modules/dataset/files/html/pagecounts-ez_index.html 
b/modules/dataset/files/html/pagecounts-ez_index.html
index 731d34d..1c13b77 100644
--- a/modules/dataset/files/html/pagecounts-ez_index.html
+++ b/modules/dataset/files/html/pagecounts-ez_index.html
@@ -1,53 +1,80 @@
 <html>
 <!-- This file is maintained by puppet!! -->
-<!-- modules/dataset/files/html/pagestats-ez_index.html -->
-       <head>
-               <title>Various statistics files maintained by Erik 
Zachte</title>
-       </head>
-       <body bgcolor="#ffffff">
-               <h1>Stats files maintained by Erik Zachte</h1>
-               <p>Pagecount files repackaged and reformatted, one file per 
month:
-               <a href="monthly/">link</a>
-               </p>
-               <p>Projectcount files repackaged, one file per year:
-               <a href="projectcounts/">link</a>
-               </p>
-               <p>Raw data for reports at http://stats.wikimedia.org/:
-               <a href="wikistats/">link</a>
-               </p>
-               <hr />
-               <p>Notes about the format of the pagecount files</p>
-               <p>These are
-               derived from Domas' pagecount files but the format is not 
identical.
-               Each line contains four fields separated by spaces:
-               <ul>
-                       <li>wiki code (subproject.project)</li>
-                       <li>article title</li>
-                       <li>monthly total (with interpolation when data is 
missing)</li>
-                       <li>hourly counts</li>
-               </ul>
-               In the wiki code, the subproject is the language code (fr, el, 
ja, etc)
-               and the project is one of b,k,n,q,s,v,z, corresponding to the 
projects below:
-               <ul>
-                       <li>b:wikibooks</li>
-                       <li>k:wiktionary</li>
-                       <li>n:wikinews</li>
-                       <li>q:wikiquote</li>
-                       <li>s:wikisource</li>
-                       <li>v:wikiversity</li>
-                       <li>z:wikipedia</li>
-               </ul>
-               Hourly counts can be deciphered as follows:
-               <dl>
-                       <dt>Hour:</dt>
-                       <dd>from 0 to 23, written as 0 = A, 1 = B ... 22 = W, 
23 = X</dd>
-                       <dt>Day:</dt>
-                       <dd>from 1 to 31, written as 0 = A, 1 = B ... 25 = Y, 
26 = Z, 27 = [, 28 = \, 29 = ], 30 = ^, 31 = _</dd>
-               </dl>
-               </p>
-               <p>
-               Source for this information is <a 
href="http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054644.html";>http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054644.html</a>.
-               </p>
-       </body>
+<!-- modules/dataset/files/html/pagecounts-ez_index.html -->
+    <head>
+        <title>Wikistats files</title>
+    </head>
+    <body bgcolor="#ffffff">
+        <h1>Wikistats files</h1>
+        <b>Maintained by Erik Zachte</b>
+        <p>
+                  <a 
href="http://dumps.wikimedia.org/other/pagecounts-ez/merged/";>
+                    Hourly page views per article</a>
+                  for around 30 million article titles
+                  (Sept 2013) in around 800+ Wikimedia wikis. Repackaged (with 
extreme
+                  shrinkage, without losing granularity), corrected, 
reformatted. Daily
+                  files and two monthly files (see notes below).
+                </p>
+        <p>
+                  <a 
href="http://dumps.wikimedia.org/other/pagecounts-ez/projectcounts/";>
+                    Hourly page views per wiki</a>
+                  , corrected for site outages and underreporting. Also 
repackaged,
+                  as one tar file per year.
+                </p>
+        <p>
+                  <a 
href="http://dumps.wikimedia.org/other/pagecounts-ez/wikistats/";>
+                    Raw data</a>
+                 for reports at <a 
href='http://stats.wikimedia.org/'>stats.wikimedia.org</a>.
+                </p>
+        <hr />
+        <p><b>Notes for hourly page views</b></p>
+        <p>
+                  Both sets of hourly files have been derived from Domas'
+                  <a href="http://dumps.wikimedia.org/other/pagecounts-raw/";>
+                    pagecount/projectcount files</a>
+                  but the format is different.
+                </p>
+        <p>
+                  The huge hourly files for page views per article per wiki
+                  have been massively compressed by merging 720 files per 
month,
+          thus removing massive redundancy (80% of record space is article
+                  title, and a title can occur in all 720 files).
+          All of this shrinkage without losing hourly granularity.
+                </p>
+        <p>
+                  Line format:
+          <ul>
+            <li>wiki code (subproject.project)</li>
+            <li>article title</li>
+            <li>monthly total (with interpolation when data is missing)</li>
+            <li>hourly counts</li>
+          </ul>
+        </p>
+        <p>
+                  In the wiki code field, the subproject is the language code 
(fr, el, ja, etc)
+                  or meta, commons etc.
+        </p>
+                <p>
+          The project is one of b (wikibooks), k (wiktionary), n (wikinews), o 
(wikivoyage), q (wikiquote),
+          s (wikisource), v (wikiversity), z (wikipedia).
+                </p>
+        <p>
+                  Hourly counts can be deciphered as follows:
+                  <dl>
+            <dt>Hour:</dt>
+            <dd>from 0 to 23, written as 0 = A, 1 = B ... 22 = W, 23 = X</dd>
+            <dt>Day:</dt>
+            <dd>from 1 to 31, written as 1 = A, 2 = B ... 25 = Y, 26 = Z, 27 = 
[, 28 = \, 29 = ], 30 = ^, 31 = _</dd>
+          </dl>
+          Example: 33 views on day 2, hour 4, and 155 views on day 3, hour 7 
are coded as 'BE33,CH155'
+                </p>
+        <p>
+                  Source for this information:
+                  <a 
href="http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054644.html";>
+                    
http://lists.wikimedia.org/pipermail/wikitech-l/2011-August/054591.html</a>.
+                </p>
+
+    </small>
+    </body>
 </html>
 

-- 
To view, visit https://gerrit.wikimedia.org/r/190457
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I93a2fcf81efca9884c2da2007f1d0e9a02813d3a
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: ArielGlenn <ar...@wikimedia.org>
Gerrit-Reviewer: ArielGlenn <ar...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to