Re: [Analytics] Parsing user agents in EventLogging data

Dan Andreescu Thu, 15 Sep 2016 07:20:27 -0700

The problem with working on EL data in hive is that the schemas for the
tables can change at any point, in backwards-incompatible ways.  And
maintaining tables dynamically is harder here than in mysql world (where EL
just tries to insert, and creates the table on failure).  So, while it's
relatively easy to use ua-parser (see below), you can't easily access EL
data in hive tables.  However, we do have all EL data in hadoop, so you can
access it with Spark.  Andrew's about to answer with more details on that.
I just thought this might be useful if you sqoop EL data from mysql or
otherwise import it into a Hive table:



from stat1002, start hive, then:

ADD JAR
/srv/deployment/analytics/refinery/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.35.jar;

CREATE TEMPORARY FUNCTION ua_parser as
'org.wikimedia.analytics.refinery.hive.UAParserUDF';

select ua_parser('Wikimedia Bot');

On Thu, Sep 15, 2016 at 1:06 AM, Federico Leva (Nemo) <[email protected]>
wrote:

> Tilman Bayer, 15/09/2016 01:21:
>
>> This came up recently with the Reading web team, for the purpose of
>> investigating whether certain issues are caused by certain browsers
>> only. But I imagine it has arisen in other places as well.
>>
>
> Definitely. https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitiz
> ation
>
> Nemo
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Parsing user agents in EventLogging data

Reply via email to