Ottomata has submitted this change and it was merged. (
https://gerrit.wikimedia.org/r/362310 )
Change subject: Adding "tags" column to webrequest
......................................................................
Adding "tags" column to webrequest
This column will hold an array of strings we call tags.
It will be populated by a UDF
that understands webrequest data and can classify
requests into types like "portal", "wikidata" and others.
Tags are used by a job that splits webrequest into
smaller subsets.
Bug: T164021
Change-Id: Ie855d6b3a2d12921a4a89de3f84ec5ff5d1fe01a
---
M hive/webrequest/create_webrequest_table.hql
1 file changed, 3 insertions(+), 1 deletion(-)
Approvals:
Ottomata: Verified; Looks good to me, approved
diff --git a/hive/webrequest/create_webrequest_table.hql
b/hive/webrequest/create_webrequest_table.hql
index bb6c0b0..b9b8ca6 100644
--- a/hive/webrequest/create_webrequest_table.hql
+++ b/hive/webrequest/create_webrequest_table.hql
@@ -54,7 +54,9 @@
`normalized_host` struct<project_class: string, project:string,
qualifiers: array<string>, tld: String> COMMENT 'struct containing
project_class (such as wikipedia or wikidata for instance), project (such as en
or commons), qualifiers (a list of in-between values, such as m and/or zero)
and tld (org most often)',
`pageview_info` map<string, string> COMMENT 'map containing project,
language_variant and page_title values only when is_pageview = TRUE.',
`page_id` bigint COMMENT 'MediaWiki page_id for this page
title. For redirects this could be the page_id of the redirect or the page_id
of the target. This may not always be set, even if the page is actually a
pageview.',
- `namespace_id` int COMMENT 'MediaWiki namespace_id for this page
title. This may not always be set, even if the page is actually a pageview.'
+ `namespace_id` int COMMENT 'MediaWiki namespace_id for this page
title. This may not always be set, even if the page is actually a pageview.',
+ `tags` array<string> COMMENT 'List containing tags qualifying
the request, ex: ['portal', 'wikidata']. Will be used to split webrequest into
smaller subsets.'
+
)
PARTITIONED BY (
`webrequest_source` string COMMENT 'Source cluster',
--
To view, visit https://gerrit.wikimedia.org/r/362310
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie855d6b3a2d12921a4a89de3f84ec5ff5d1fe01a
Gerrit-PatchSet: 5
Gerrit-Project: analytics/refinery
Gerrit-Branch: master
Gerrit-Owner: Nuria <[email protected]>
Gerrit-Reviewer: Joal <[email protected]>
Gerrit-Reviewer: Mforns <[email protected]>
Gerrit-Reviewer: Nuria <[email protected]>
Gerrit-Reviewer: Ottomata <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits