Milimetric has submitted this change and it was merged.

Change subject: Change line- and tabular- browser reports to percent
......................................................................


Change line- and tabular- browser reports to percent

Make the queries that feed the line- and tabular- charts in the
browser reports dashboard return percentages instead of absolute
values. Note that before, some queries were used for both the
tabular chart and the sunburst (hierarchy) chart. Now, those
queries have been duplicated, one for the tabular view (percent)
and another for the hierarchy view (absolute numbers).

This change also removes the deprecated config options:
frequency and timeboxed.

Bug: T130406
Change-Id: Ia16952bf4a4b531a2cfba27212802f949950433b
---
D browser/all_sites_by_browser_family
M browser/all_sites_by_browser_family_and_major
A browser/all_sites_by_browser_family_and_major_percent
A browser/all_sites_by_browser_family_percent
M browser/all_sites_by_os_and_browser
A browser/all_sites_by_os_and_browser_percent
D browser/all_sites_by_os_family
M browser/all_sites_by_os_family_and_major
A browser/all_sites_by_os_family_and_major_percent
A browser/all_sites_by_os_family_percent
M browser/config.yaml
M browser/dynamic_pivot.py
D browser/mobile_site_by_browser_family
M browser/mobile_site_by_browser_family_and_major
A browser/mobile_site_by_browser_family_and_major_percent
A browser/mobile_site_by_browser_family_percent
D browser/mobile_site_by_os_family
M browser/mobile_site_by_os_family_and_major
A browser/mobile_site_by_os_family_and_major_percent
A browser/mobile_site_by_os_family_percent
20 files changed, 336 insertions(+), 111 deletions(-)

Approvals:
  Milimetric: Verified; Looks good to me, approved



diff --git a/browser/all_sites_by_browser_family 
b/browser/all_sites_by_browser_family
deleted file mode 100755
index 257b8eb..0000000
--- a/browser/all_sites_by_browser_family
+++ /dev/null
@@ -1,18 +0,0 @@
-#!/bin/bash
-hive -e "
-    SELECT
-        '$1' AS date,
-        browser_family,
-        SUM(view_count) as view_count
-    FROM wmf.browser_general
-    WHERE
-        access_method IN ('desktop', 'mobile web') AND
-        CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) >= '$1' 
AND
-        CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) < '$2'
-    GROUP BY
-        '$1',
-        browser_family
-    ORDER BY view_count DESC
-    LIMIT 1000
-    ;
-" 2> /dev/null | python $3/dynamic_pivot.py
diff --git a/browser/all_sites_by_browser_family_and_major 
b/browser/all_sites_by_browser_family_and_major
index cf140dd..a986b2f 100755
--- a/browser/all_sites_by_browser_family_and_major
+++ b/browser/all_sites_by_browser_family_and_major
@@ -17,4 +17,4 @@
     ORDER BY view_count DESC
     LIMIT 1000
     ;
-" 2> /dev/null
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/all_sites_by_browser_family_and_major_percent 
b/browser/all_sites_by_browser_family_and_major_percent
new file mode 100755
index 0000000..01b1ae0
--- /dev/null
+++ b/browser/all_sites_by_browser_family_and_major_percent
@@ -0,0 +1,31 @@
+#!/bin/bash
+hive -e "
+    SET hive.mapred.mode = nonstrict;
+    WITH
+        slice AS (
+            SELECT *
+            FROM wmf.browser_general
+            WHERE
+                access_method IN ('desktop', 'mobile web') AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
>= '$1' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
< '$2'
+        ),
+        total AS (
+            SELECT SUM(view_count) as view_count_total
+            FROM slice
+        )
+    SELECT
+        '$1' AS date,
+        browser_family,
+        browser_major,
+        SUM(view_count) * 100 / view_count_total AS percent
+    FROM slice JOIN total
+    GROUP BY
+        '$1',
+        browser_family,
+        browser_major,
+        view_count_total
+    ORDER BY percent DESC
+    LIMIT 1000
+    ;
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/all_sites_by_browser_family_percent 
b/browser/all_sites_by_browser_family_percent
new file mode 100755
index 0000000..7ae625e
--- /dev/null
+++ b/browser/all_sites_by_browser_family_percent
@@ -0,0 +1,29 @@
+#!/bin/bash
+hive -e "
+    SET hive.mapred.mode = nonstrict;
+    WITH
+        slice AS (
+            SELECT *
+            FROM wmf.browser_general
+            WHERE
+                access_method IN ('desktop', 'mobile web') AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
>= '$1' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
< '$2'
+        ),
+        total AS (
+            SELECT SUM(view_count) as view_count_total
+            FROM slice
+        )
+    SELECT
+        '$1' AS date,
+        browser_family,
+        SUM(view_count) * 100 / view_count_total AS percent
+    FROM slice JOIN total
+    GROUP BY
+        '$1',
+        browser_family,
+        view_count_total
+    ORDER BY percent DESC
+    LIMIT 1000
+    ;
+" 2> /dev/null | grep -v parquet.hadoop | python $3/dynamic_pivot.py
diff --git a/browser/all_sites_by_os_and_browser 
b/browser/all_sites_by_os_and_browser
index 4b97456..2acfd56 100755
--- a/browser/all_sites_by_os_and_browser
+++ b/browser/all_sites_by_os_and_browser
@@ -21,4 +21,4 @@
     ORDER BY view_count DESC
     LIMIT 1000
     ;
-" 2> /dev/null
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/all_sites_by_os_and_browser_percent 
b/browser/all_sites_by_os_and_browser_percent
new file mode 100755
index 0000000..317a488
--- /dev/null
+++ b/browser/all_sites_by_os_and_browser_percent
@@ -0,0 +1,35 @@
+#!/bin/bash
+hive -e "
+    SET hive.mapred.mode = nonstrict;
+    WITH
+        slice AS (
+            SELECT *
+            FROM wmf.browser_general
+            WHERE
+                access_method IN ('desktop', 'mobile web') AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
>= '$1' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
< '$2'
+        ),
+        total AS (
+            SELECT SUM(view_count) as view_count_total
+            FROM slice
+        )
+    SELECT
+        '$1' AS date,
+        os_family,
+        os_major,
+        browser_family,
+        browser_major,
+        SUM(view_count) * 100 / view_count_total AS percent
+    FROM slice JOIN total
+    GROUP BY
+        '$1',
+        os_family,
+        os_major,
+        browser_family,
+        browser_major,
+        view_count_total
+    ORDER BY percent DESC
+    LIMIT 1000
+    ;
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/all_sites_by_os_family b/browser/all_sites_by_os_family
deleted file mode 100755
index 0162f76..0000000
--- a/browser/all_sites_by_os_family
+++ /dev/null
@@ -1,18 +0,0 @@
-#!/bin/bash
-hive -e "
-    SELECT
-        '$1' AS date,
-        os_family,
-        SUM(view_count) as view_count
-    FROM wmf.browser_general
-    WHERE
-        access_method IN ('desktop', 'mobile web') AND
-        CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) >= '$1' 
AND
-        CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) < '$2'
-    GROUP BY
-        '$1',
-        os_family
-    ORDER BY view_count DESC
-    LIMIT 1000
-    ;
-" 2> /dev/null | python $3/dynamic_pivot.py
diff --git a/browser/all_sites_by_os_family_and_major 
b/browser/all_sites_by_os_family_and_major
index 010ae6c..098d959 100755
--- a/browser/all_sites_by_os_family_and_major
+++ b/browser/all_sites_by_os_family_and_major
@@ -17,4 +17,4 @@
     ORDER BY view_count DESC
     LIMIT 1000
     ;
-" 2> /dev/null
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/all_sites_by_os_family_and_major_percent 
b/browser/all_sites_by_os_family_and_major_percent
new file mode 100755
index 0000000..fe6a539
--- /dev/null
+++ b/browser/all_sites_by_os_family_and_major_percent
@@ -0,0 +1,31 @@
+#!/bin/bash
+hive -e "
+    SET hive.mapred.mode = nonstrict;
+    WITH
+        slice AS (
+            SELECT *
+            FROM wmf.browser_general
+            WHERE
+                access_method IN ('desktop', 'mobile web') AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
>= '$1' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
< '$2'
+        ),
+        total AS (
+            SELECT SUM(view_count) as view_count_total
+            FROM slice
+        )
+    SELECT
+        '$1' AS date,
+        os_family,
+        os_major,
+        SUM(view_count) * 100 / view_count_total AS percent
+    FROM slice JOIN total
+    GROUP BY
+        '$1',
+        os_family,
+        os_major,
+        view_count_total
+    ORDER BY percent DESC
+    LIMIT 1000
+    ;
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/all_sites_by_os_family_percent 
b/browser/all_sites_by_os_family_percent
new file mode 100755
index 0000000..b68af4b
--- /dev/null
+++ b/browser/all_sites_by_os_family_percent
@@ -0,0 +1,29 @@
+#!/bin/bash
+hive -e "
+    SET hive.mapred.mode = nonstrict;
+    WITH
+        slice AS (
+            SELECT *
+            FROM wmf.browser_general
+            WHERE
+                access_method IN ('desktop', 'mobile web') AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
>= '$1' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
< '$2'
+        ),
+        total AS (
+            SELECT SUM(view_count) as view_count_total
+            FROM slice
+        )
+    SELECT
+        '$1' AS date,
+        os_family,
+        SUM(view_count) * 100 / view_count_total AS percent
+    FROM slice JOIN total
+    GROUP BY
+        '$1',
+        os_family,
+        view_count_total
+    ORDER BY percent DESC
+    LIMIT 1000
+    ;
+" 2> /dev/null | grep -v parquet.hadoop | python $3/dynamic_pivot.py
diff --git a/browser/config.yaml b/browser/config.yaml
index 6ba66eb..fbfbade 100644
--- a/browser/config.yaml
+++ b/browser/config.yaml
@@ -2,77 +2,94 @@
 
     all_sites_by_browser_family_and_major:
         type: script
-        frequency: weeks
         granularity: weeks
-        timeboxed: true
+        funnel: true
+        starts: 2015-06-07
+        lag: 86400 # wait 1 day to compute last week
+
+    all_sites_by_browser_family_and_major_percent:
+        type: script
+        granularity: weeks
+        funnel: true
+        starts: 2015-06-07
+        lag: 86400 # wait 1 day to compute last week
+
+    all_sites_by_browser_family_percent:
+        type: script
+        granularity: weeks
+        starts: 2015-06-07
+        lag: 86400 # wait 1 day to compute last week
+
+    all_sites_by_os_and_browser:
+        type: script
+        granularity: weeks
+        funnel: true
+        starts: 2015-06-07
+        lag: 86400 # wait 1 day to compute last week
+
+    all_sites_by_os_and_browser_percent:
+        type: script
+        granularity: weeks
         funnel: true
         starts: 2015-06-07
         lag: 86400 # wait 1 day to compute last week
 
     all_sites_by_os_family_and_major:
         type: script
-        frequency: weeks
         granularity: weeks
-        timeboxed: true
         funnel: true
         starts: 2015-06-07
         lag: 86400 # wait 1 day to compute last week
 
-    all_sites_by_os_and_browser:
+    all_sites_by_os_family_and_major_percent:
         type: script
-        frequency: weeks
         granularity: weeks
-        timeboxed: true
         funnel: true
+        starts: 2015-06-07
+        lag: 86400 # wait 1 day to compute last week
+
+    all_sites_by_os_family_percent:
+        type: script
+        granularity: weeks
         starts: 2015-06-07
         lag: 86400 # wait 1 day to compute last week
 
     mobile_site_by_browser_family_and_major:
         type: script
-        frequency: weeks
         granularity: weeks
-        timeboxed: true
         funnel: true
+        starts: 2015-06-07
+        lag: 86400 # wait 1 day to compute last week
+
+    mobile_site_by_browser_family_and_major_percent:
+        type: script
+        granularity: weeks
+        funnel: true
+        starts: 2015-06-07
+        lag: 86400 # wait 1 day to compute last week
+
+    mobile_site_by_browser_family_percent:
+        type: script
+        granularity: weeks
         starts: 2015-06-07
         lag: 86400 # wait 1 day to compute last week
 
     mobile_site_by_os_family_and_major:
         type: script
-        frequency: weeks
         granularity: weeks
-        timeboxed: true
         funnel: true
         starts: 2015-06-07
         lag: 86400 # wait 1 day to compute last week
 
-    all_sites_by_browser_family:
+    mobile_site_by_os_family_and_major_percent:
         type: script
-        frequency: weeks
         granularity: weeks
-        timeboxed: true
+        funnel: true
         starts: 2015-06-07
         lag: 86400 # wait 1 day to compute last week
 
-    all_sites_by_os_family:
+    mobile_site_by_os_family_percent:
         type: script
-        frequency: weeks
         granularity: weeks
-        timeboxed: true
-        starts: 2015-06-07
-        lag: 86400 # wait 1 day to compute last week
-
-    mobile_site_by_browser_family:
-        type: script
-        frequency: weeks
-        granularity: weeks
-        timeboxed: true
-        starts: 2015-06-07
-        lag: 86400 # wait 1 day to compute last week
-
-    mobile_site_by_os_family:
-        type: script
-        frequency: weeks
-        granularity: weeks
-        timeboxed: true
         starts: 2015-06-07
         lag: 86400 # wait 1 day to compute last week
diff --git a/browser/dynamic_pivot.py b/browser/dynamic_pivot.py
index a2b53ee..8c45101 100644
--- a/browser/dynamic_pivot.py
+++ b/browser/dynamic_pivot.py
@@ -13,7 +13,12 @@
        else:
                date, category, value = row
                dates.add(date)
-               pivoted[category][date] += int(value)
+               try:
+                       number = int(value)
+               except:
+                       number = float(value)
+               finally:
+                       pivoted[category][date] += number
 
 categories = sorted(pivoted.keys())
 print header[0] + '\t' + '\t'.join(categories)
diff --git a/browser/mobile_site_by_browser_family 
b/browser/mobile_site_by_browser_family
deleted file mode 100755
index 28887eb..0000000
--- a/browser/mobile_site_by_browser_family
+++ /dev/null
@@ -1,18 +0,0 @@
-#!/bin/bash
-hive -e "
-    SELECT
-        '$1' AS date,
-        browser_family,
-        SUM(view_count) as view_count
-    FROM wmf.browser_general
-    WHERE
-        access_method = 'mobile web' AND
-        CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) >= '$1' 
AND
-        CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) < '$2'
-    GROUP BY
-        '$1',
-        browser_family
-    ORDER BY view_count DESC
-    LIMIT 1000
-    ;
-" 2> /dev/null | python $3/dynamic_pivot.py
diff --git a/browser/mobile_site_by_browser_family_and_major 
b/browser/mobile_site_by_browser_family_and_major
index b755052..6c16159 100755
--- a/browser/mobile_site_by_browser_family_and_major
+++ b/browser/mobile_site_by_browser_family_and_major
@@ -17,4 +17,4 @@
     ORDER BY view_count DESC
     LIMIT 1000
     ;
-" 2> /dev/null
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/mobile_site_by_browser_family_and_major_percent 
b/browser/mobile_site_by_browser_family_and_major_percent
new file mode 100755
index 0000000..664efb9
--- /dev/null
+++ b/browser/mobile_site_by_browser_family_and_major_percent
@@ -0,0 +1,31 @@
+#!/bin/bash
+hive -e "
+    SET hive.mapred.mode = nonstrict;
+    WITH
+        slice AS (
+            SELECT *
+            FROM wmf.browser_general
+            WHERE
+                access_method = 'mobile web' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
>= '$1' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
< '$2'
+        ),
+        total AS (
+            SELECT SUM(view_count) as view_count_total
+            FROM slice
+        )
+    SELECT
+        '$1' AS date,
+        browser_family,
+        browser_major,
+        SUM(view_count) * 100 / view_count_total AS percent
+    FROM slice JOIN total
+    GROUP BY
+        '$1',
+        browser_family,
+        browser_major,
+        view_count_total
+    ORDER BY percent DESC
+    LIMIT 1000
+    ;
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/mobile_site_by_browser_family_percent 
b/browser/mobile_site_by_browser_family_percent
new file mode 100755
index 0000000..8b15133
--- /dev/null
+++ b/browser/mobile_site_by_browser_family_percent
@@ -0,0 +1,29 @@
+#!/bin/bash
+hive -e "
+    SET hive.mapred.mode = nonstrict;
+    WITH
+        slice AS (
+            SELECT *
+            FROM wmf.browser_general
+            WHERE
+                access_method = 'mobile web' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
>= '$1' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
< '$2'
+        ),
+        total AS (
+            SELECT SUM(view_count) as view_count_total
+            FROM slice
+        )
+    SELECT
+        '$1' AS date,
+        browser_family,
+        SUM(view_count) * 100 / view_count_total AS percent
+    FROM slice JOIN total
+    GROUP BY
+        '$1',
+        browser_family,
+        view_count_total
+    ORDER BY percent DESC
+    LIMIT 1000
+    ;
+" 2> /dev/null | grep -v parquet.hadoop | python $3/dynamic_pivot.py
diff --git a/browser/mobile_site_by_os_family b/browser/mobile_site_by_os_family
deleted file mode 100755
index a0fd6ae..0000000
--- a/browser/mobile_site_by_os_family
+++ /dev/null
@@ -1,18 +0,0 @@
-#!/bin/bash
-hive -e "
-    SELECT
-        '$1' AS date,
-        os_family,
-        SUM(view_count) as view_count
-    FROM wmf.browser_general
-    WHERE
-        access_method = 'mobile web' AND
-        CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) >= '$1' 
AND
-        CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) < '$2'
-    GROUP BY
-        '$1',
-        os_family
-    ORDER BY view_count DESC
-    LIMIT 1000
-    ;
-" 2> /dev/null | python $3/dynamic_pivot.py
diff --git a/browser/mobile_site_by_os_family_and_major 
b/browser/mobile_site_by_os_family_and_major
index 763e515..c735233 100755
--- a/browser/mobile_site_by_os_family_and_major
+++ b/browser/mobile_site_by_os_family_and_major
@@ -17,4 +17,4 @@
     ORDER BY view_count DESC
     LIMIT 1000
     ;
-" 2> /dev/null
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/mobile_site_by_os_family_and_major_percent 
b/browser/mobile_site_by_os_family_and_major_percent
new file mode 100755
index 0000000..9fca96e
--- /dev/null
+++ b/browser/mobile_site_by_os_family_and_major_percent
@@ -0,0 +1,31 @@
+#!/bin/bash
+hive -e "
+    SET hive.mapred.mode = nonstrict;
+    WITH
+        slice AS (
+            SELECT *
+            FROM wmf.browser_general
+            WHERE
+                access_method = 'mobile web' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
>= '$1' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
< '$2'
+        ),
+        total AS (
+            SELECT SUM(view_count) as view_count_total
+            FROM slice
+        )
+    SELECT
+        '$1' AS date,
+        os_family,
+        os_major,
+        SUM(view_count) * 100 / view_count_total AS percent
+    FROM slice JOIN total
+    GROUP BY
+        '$1',
+        os_family,
+        os_major,
+        view_count_total
+    ORDER BY percent DESC
+    LIMIT 1000
+    ;
+" 2> /dev/null | grep -v parquet.hadoop
diff --git a/browser/mobile_site_by_os_family_percent 
b/browser/mobile_site_by_os_family_percent
new file mode 100755
index 0000000..6fcf03f
--- /dev/null
+++ b/browser/mobile_site_by_os_family_percent
@@ -0,0 +1,29 @@
+#!/bin/bash
+hive -e "
+    SET hive.mapred.mode = nonstrict;
+    WITH
+        slice AS (
+            SELECT *
+            FROM wmf.browser_general
+            WHERE
+                access_method = 'mobile web' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
>= '$1' AND
+                CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) 
< '$2'
+        ),
+        total AS (
+            SELECT SUM(view_count) as view_count_total
+            FROM slice
+        )
+    SELECT
+        '$1' AS date,
+        os_family,
+        SUM(view_count) * 100 / view_count_total AS percent
+    FROM slice JOIN total
+    GROUP BY
+        '$1',
+        os_family,
+        view_count_total
+    ORDER BY percent DESC
+    LIMIT 1000
+    ;
+" 2> /dev/null | grep -v parquet.hadoop | python $3/dynamic_pivot.py

-- 
To view, visit https://gerrit.wikimedia.org/r/280386
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia16952bf4a4b531a2cfba27212802f949950433b
Gerrit-PatchSet: 6
Gerrit-Project: analytics/reportupdater-queries
Gerrit-Branch: master
Gerrit-Owner: Mforns <[email protected]>
Gerrit-Reviewer: Mforns <[email protected]>
Gerrit-Reviewer: Milimetric <[email protected]>
Gerrit-Reviewer: Nuria <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to