Just a heads up: user_agent field is a PII field (privacy sensitive), and as such it is purged after 90 days. If there would be a user_agent_map field, it should be purged after 90 days as well.
Another more permanent option might be to detect the browser family on the JavaScript client with i.e. duck-typing[1] and send it as part of the explicit schema. The browser family by itself is not identifying enough to be considered PII, and could be kept indefinitely. [1] http://stackoverflow.com/questions/9847580/how-to-detect-safari-chrome-ie-firefox-and-opera-browser On Thu, Sep 15, 2016 at 5:40 PM, Jane Darnell <[email protected]> wrote: > It's not just a question of which value to choose, but also how to sort. > It would be nice to be able to choose sorting in alphabetical order vs > numerical order. It would also be nice to assign a default sort to any item > label that is taken from the Wikipedia {{DEFAULTSORT}} template (though > that won't work for items without a Wikipedia article). > > On Thu, Sep 15, 2016 at 10:18 AM, Dan Andreescu <[email protected]> > wrote: > >> The problem with working on EL data in hive is that the schemas for the >> tables can change at any point, in backwards-incompatible ways. And >> maintaining tables dynamically is harder here than in mysql world (where EL >> just tries to insert, and creates the table on failure). So, while it's >> relatively easy to use ua-parser (see below), you can't easily access EL >> data in hive tables. However, we do have all EL data in hadoop, so you can >> access it with Spark. Andrew's about to answer with more details on that. >> I just thought this might be useful if you sqoop EL data from mysql or >> otherwise import it into a Hive table: >> >> >> from stat1002, start hive, then: >> >> ADD JAR /srv/deployment/analytics/refinery/artifacts/org/wikimedia/ >> analytics/refinery/refinery-hive-0.0.35.jar; >> >> CREATE TEMPORARY FUNCTION ua_parser as 'org.wikimedia.analytics.refin >> ery.hive.UAParserUDF'; >> >> select ua_parser('Wikimedia Bot'); >> >> On Thu, Sep 15, 2016 at 1:06 AM, Federico Leva (Nemo) <[email protected] >> > wrote: >> >>> Tilman Bayer, 15/09/2016 01:21: >>> >>>> This came up recently with the Reading web team, for the purpose of >>>> investigating whether certain issues are caused by certain browsers >>>> only. But I imagine it has arisen in other places as well. >>>> >>> >>> Definitely. https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitiz >>> ation >>> >>> Nemo >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- *Marcel Ruiz Forns* Analytics Developer Wikimedia Foundation
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
