Re: [Analytics] Traffic device breakdown

Oliver Keyes Fri, 10 Oct 2014 15:55:29 -0700

On 10 October 2014 16:02, Nuria Ruiz <[email protected]> wrote:

> >At some point I believe we hope to just, you know. Have a regularly
> updated browser matrix somewhere.
> I REALLY think this should make it into our goals, if it cannot be done
> this quarter it should for sure be done this quarter.
>
>
I agree it would be nice. It's one of those things that will either come as
a side-effect of other stuff, OR require subsantially more work, and
nothing in-between. Things we need for it:

1. A UDF for ua-parser or whatever we decide to use (this will possibly be
necessary for pageviews, but not necessarily - it depends on our
spider/automaton detection strategy)
2. Pageviews data
3. A table somewhere.

Take 1, apply to 2, stick in 3. Maybe grab the same data for text/html
requests overall (depends on query runtime), maybe don't.

The *ideal* implementation, obviously, is to pair this up with a site that
automatically parses the results into HTML. That should be the end goal.
but in terms of engineering support we can get most of the way there simply
by ensuring we always have a recent snapshot to hand. I can probably put
something together over the sampled logs and throw it in SQL if there are
urgent needs.

> Do we not have more recent data than May?
>

We don't, but thanks to the utilities library I built, the code for
generating it would literally run:

library(WMUtils)
uas <-
as.data.table(ua_parse(data_sieve(do.call("rbind",lapply(seq(20140901,20140930,1),sampled_logs)))$user_agent))

uas <- uas[,j = list(requests = .N, by = c("os","browser")]

write.table(uas, file = uas_for_jon.tsv, sep = "\t", row.names = FALSE,
quote = TRUE)

...assuming we didn't care about readability.

Point is, in the time until we have the new parser built into Hadoop and
that setup, we can totally generate interim data from the sampled logs
using the same parser at a tiny cost in research/programming time, iff (the
mathematical if) we need it enough that we're cool with the sampling, and
people can convince [[Dario|Our Great Leader]] to authorise me to spend 15
minutes of my time on it.

>
> On Fri, Oct 10, 2014 at 12:45 PM, Oliver Keyes <[email protected]>
> wrote:
>
>> Email Dario and I, if he prioritises it I'll run a check on more recent
>> data.
>>
>> At some point I believe we hope to just, you know. Have a regularly
>> updated browser matrix somewhere. This comes some time after pageviews
>> though.
>>
>> On 10 October 2014 14:38, Toby Negrin <[email protected]> wrote:
>>
>>> Hi Jon -- I'm sure other folks will have more information but here's a
>>> link to a slide with some data from May[1]. We don't see a lot of Windows
>>> phone traffic.
>>>
>>> -Toby
>>>
>>> [1]
>>> https://docs.google.com/a/wikimedia.org/presentation/d/19tZgTi6VUG04wfGWVzcaZKY26oQiXhPaHI9g2tBmMKE/edit#slide=id.g382406373_08
>>>
>>> On Fri, Oct 10, 2014 at 11:17 AM, Jon Robson <[email protected]>
>>> wrote:
>>>
>>>> I was going through our backlog again today, and I noticed a bug about
>>>> supporting editing on Windows Phones with IE9 [1]
>>>>
>>>> Yet again, I wondered 'how many of our users are using IE9' as I
>>>> wondered if because of this lack of support we are losing out on lots
>>>> of potential editors.
>>>>
>>>> What's the easiest way to get this information now? Is it available?
>>>>
>>>> [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=55599
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>
>>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>

-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Traffic device breakdown

Reply via email to