Nuria has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/351613 )
Change subject: Update daily unique devices druid loading job ...................................................................... Update daily unique devices druid loading job Current job loads every row, while we documented that projects with less than 1000 uniques shouldn't be taken into account because of their variability. This patch prevent those small-number-of-uniques projects to be loaded. See: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices/Last_access_solution#Data_Quality_Analysis Bug: T164183 Change-Id: I4a8359b53d0d37063844304d19a34025a05aab5b --- M oozie/last_access_uniques/druid/daily/generate_daily_druid_uniques.hql 1 file changed, 16 insertions(+), 0 deletions(-) Approvals: Nuria: Verified; Looks good to me, approved diff --git a/oozie/last_access_uniques/druid/daily/generate_daily_druid_uniques.hql b/oozie/last_access_uniques/druid/daily/generate_daily_druid_uniques.hql index 1d670d4..f6e0870 100644 --- a/oozie/last_access_uniques/druid/daily/generate_daily_druid_uniques.hql +++ b/oozie/last_access_uniques/druid/daily/generate_daily_druid_uniques.hql @@ -33,6 +33,20 @@ LOCATION '${destination_directory}'; +WITH filtered_hosts AS ( + SELECT + uri_host as filtered_host, + SUM(uniques_estimate) as checked_uniques_estimate + FROM ${source_table} + WHERE year=${year} + AND month=${month} + AND day=${day} + GROUP BY + uri_host + HAVING + SUM(uniques_estimate) >= 1000 +) + INSERT OVERWRITE TABLE tmp_daily_druid_uniques_${year}_${month}_${day} SELECT CONCAT( @@ -46,6 +60,8 @@ uniques_offset AS uniques_offset, uniques_estimate AS uniques_estimate FROM ${source_table} + INNER JOIN filtered_hosts + ON uri_host = filtered_host WHERE year = ${year} AND month = ${month} AND day = ${day}; -- To view, visit https://gerrit.wikimedia.org/r/351613 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4a8359b53d0d37063844304d19a34025a05aab5b Gerrit-PatchSet: 2 Gerrit-Project: analytics/refinery Gerrit-Branch: master Gerrit-Owner: Joal <[email protected]> Gerrit-Reviewer: Nuria <[email protected]> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
