Just a heads up:

user_agent field is a PII field (privacy sensitive), and as such it is
purged after 90 days. If there would be a user_agent_map field, it should
be purged after 90 days as well.

Another more permanent option might be to detect the browser family on the
JavaScript client with i.e. duck-typing[1] and send it as part of the
explicit schema. The browser family by itself is not identifying enough to
be considered PII, and could be kept indefinitely.

[1]
http://stackoverflow.com/questions/9847580/how-to-detect-safari-chrome-ie-firefox-and-opera-browser

On Thu, Sep 15, 2016 at 5:40 PM, Jane Darnell <[email protected]> wrote:

> It's not just a question of which value to choose, but also how to sort.
> It would be nice to be able to choose sorting in alphabetical order vs
> numerical order. It would also be nice to assign a default sort to any item
> label that is taken from the Wikipedia {{DEFAULTSORT}} template (though
> that won't work for items without a Wikipedia article).
>
> On Thu, Sep 15, 2016 at 10:18 AM, Dan Andreescu <[email protected]>
> wrote:
>
>> The problem with working on EL data in hive is that the schemas for the
>> tables can change at any point, in backwards-incompatible ways.  And
>> maintaining tables dynamically is harder here than in mysql world (where EL
>> just tries to insert, and creates the table on failure).  So, while it's
>> relatively easy to use ua-parser (see below), you can't easily access EL
>> data in hive tables.  However, we do have all EL data in hadoop, so you can
>> access it with Spark.  Andrew's about to answer with more details on that.
>> I just thought this might be useful if you sqoop EL data from mysql or
>> otherwise import it into a Hive table:
>>
>>
>> from stat1002, start hive, then:
>>
>> ADD JAR /srv/deployment/analytics/refinery/artifacts/org/wikimedia/
>> analytics/refinery/refinery-hive-0.0.35.jar;
>>
>> CREATE TEMPORARY FUNCTION ua_parser as 'org.wikimedia.analytics.refin
>> ery.hive.UAParserUDF';
>>
>> select ua_parser('Wikimedia Bot');
>>
>> On Thu, Sep 15, 2016 at 1:06 AM, Federico Leva (Nemo) <[email protected]
>> > wrote:
>>
>>> Tilman Bayer, 15/09/2016 01:21:
>>>
>>>> This came up recently with the Reading web team, for the purpose of
>>>> investigating whether certain issues are caused by certain browsers
>>>> only. But I imagine it has arisen in other places as well.
>>>>
>>>
>>> Definitely. https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitiz
>>> ation
>>>
>>> Nemo
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to