Re: [Analytics] Traffic device breakdown

Nuria Ruiz Tue, 14 Oct 2014 10:31:26 -0700

>Woah! Nice :D How are definitions updates handled?
Since we talked about this on IRC, restating here to keep the archives
happy.
We pull the ua parser jar from our archiva depot, an update will involve
building a new jar, uploading it to archiva and updating our dependency
file (pom.xml) to point to the newly updated version.




On Fri, Oct 10, 2014 at 9:59 PM, Oliver Keyes <[email protected]> wrote:

> Woah! Nice :D How are definitions updates handled?
>
> On 10 October 2014 18:58, Nuria Ruiz <[email protected]> wrote:
>
>> >1. A UDF for ua-parser or whatever we decide to use (this will possibly
>> be necessary for pageviews, but not necessarily - it depends on our
>> >spider/automaton detection strategy)
>> We got this one ready today: https://gerrit.wikimedia.org/r/#/c/166142/
>>
>>
>>
>>
>> On Fri, Oct 10, 2014 at 3:55 PM, Oliver Keyes <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On 10 October 2014 16:02, Nuria Ruiz <[email protected]> wrote:
>>>
>>>> >At some point I believe we hope to just, you know. Have a regularly
>>>> updated browser matrix somewhere.
>>>> I REALLY think this should make it into our goals, if it cannot be done
>>>> this quarter it should for sure be done this quarter.
>>>>
>>>>
>>> I agree it would be nice. It's one of those things that will either come
>>> as a side-effect of other stuff, OR require subsantially more work, and
>>> nothing in-between. Things we need for it:
>>>
>>> 1. A UDF for ua-parser or whatever we decide to use (this will possibly
>>> be necessary for pageviews, but not necessarily - it depends on our
>>> spider/automaton detection strategy)
>>> 2. Pageviews data
>>> 3. A table somewhere.
>>>
>>> Take 1, apply to 2, stick in 3. Maybe grab the same data for text/html
>>> requests overall (depends on query runtime), maybe don't.
>>>
>>> The *ideal* implementation, obviously, is to pair this up with a site
>>> that automatically parses the results into HTML. That should be the end
>>> goal. but in terms of engineering support we can get most of the way there
>>> simply by ensuring we always have a recent snapshot to hand. I can probably
>>> put something together over the sampled logs and throw it in SQL if there
>>> are urgent needs.
>>>
>>>
>>>> Do we not have more recent data than May?
>>>>
>>>
>>> We don't, but thanks to the utilities library I built, the code for
>>> generating it would literally run:
>>>
>>> library(WMUtils)
>>> uas <-
>>> as.data.table(ua_parse(data_sieve(do.call("rbind",lapply(seq(20140901,20140930,1),sampled_logs)))$user_agent))
>>>
>>> uas <- uas[,j = list(requests = .N, by = c("os","browser")]
>>>
>>> write.table(uas, file = uas_for_jon.tsv, sep = "\t", row.names = FALSE,
>>> quote = TRUE)
>>>
>>> ...assuming we didn't care about readability.
>>>
>>> Point is, in the time until we have the new parser built into Hadoop and
>>> that setup, we can totally generate interim data from the sampled logs
>>> using the same parser at a tiny cost in research/programming time, iff (the
>>> mathematical if) we need it enough that we're cool with the sampling, and
>>> people can convince [[Dario|Our Great Leader]] to authorise me to spend 15
>>> minutes of my time on it.
>>>
>>>
>>>>
>>>> On Fri, Oct 10, 2014 at 12:45 PM, Oliver Keyes <[email protected]>
>>>> wrote:
>>>>
>>>>> Email Dario and I, if he prioritises it I'll run a check on more
>>>>> recent data.
>>>>>
>>>>> At some point I believe we hope to just, you know. Have a regularly
>>>>> updated browser matrix somewhere. This comes some time after pageviews
>>>>> though.
>>>>>
>>>>> On 10 October 2014 14:38, Toby Negrin <[email protected]> wrote:
>>>>>
>>>>>> Hi Jon -- I'm sure other folks will have more information but here's
>>>>>> a link to a slide with some data from May[1]. We don't see a lot of 
>>>>>> Windows
>>>>>> phone traffic.
>>>>>>
>>>>>> -Toby
>>>>>>
>>>>>> [1]
>>>>>> https://docs.google.com/a/wikimedia.org/presentation/d/19tZgTi6VUG04wfGWVzcaZKY26oQiXhPaHI9g2tBmMKE/edit#slide=id.g382406373_08
>>>>>>
>>>>>> On Fri, Oct 10, 2014 at 11:17 AM, Jon Robson <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I was going through our backlog again today, and I noticed a bug
>>>>>>> about
>>>>>>> supporting editing on Windows Phones with IE9 [1]
>>>>>>>
>>>>>>> Yet again, I wondered 'how many of our users are using IE9' as I
>>>>>>> wondered if because of this lack of support we are losing out on lots
>>>>>>> of potential editors.
>>>>>>>
>>>>>>> What's the easiest way to get this information now? Is it available?
>>>>>>>
>>>>>>> [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=55599
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Analytics mailing list
>>>>>>> [email protected]
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Oliver Keyes
>>>>> Research Analyst
>>>>> Wikimedia Foundation
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>
>>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Traffic device breakdown

Reply via email to