On Wed, Oct 15, 2014 at 12:12 PM, Andrew Otto <[email protected]> wrote: > Jon, > > Recent unsampled webrequest logs are available for querying in Hive now! > > https://wikitech.wikimedia.org/wiki/Analytics/Cluster > > :) > > If you don’t already have access for this, submit an RT request to get access > to stat1002 and the analytics-privatedata-group. >
That's good to know. Thanks. I'm not sure if I have stat1002 access but every time you mention RT I shudder ;-) Thanks for the dump of data Nuria. I assume these all add up to 100% (roughly) and are global? So if I understand correctly, if I get the above access and follow your instructions I can get this data when I do need it until we have some nice page I can go to to retrieve it :). This is good to know when we have these sort of questions so thanks a bunch. We are currently interested in phablet traffic (big screen mobile devices) so this should be useful information for us thanks! On Thu, Oct 16, 2014 at 7:15 PM, Nuria Ruiz <[email protected]> wrote: >>And I have no idea what our traffic for >>Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of >>our traffic. > So the answer to this question (with preliminary data) is that neither 2.1 > nor 2.2 amount to 0.05% of traffic to the mobile site. > > I have attached the list of user agents and devices (with percentages) for > the last 30 days. I did not included any device/browser combo with less than > 0.05% of traffic. > > For about 4% of traffic we could not identify the browser, this might be > cause the user agent was not there or because ua-parser could not figure it > out, I understand this is not ideal but I am sending this cause I feel this > list provides quite a bit of value and should help you triage bugs. > > iOS takes the cake which does not cease to amaze me. > > I described what I did to gather the data here (anyone with permits to 1002 > can repro): > https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/QueryUsingUDF > > > On Wed, Oct 15, 2014 at 12:15 PM, Nuria Ruiz <[email protected]> wrote: >> >> >And I have no idea what our traffic for >> >Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of >> >our traffic. >> Understood, it is hard for you guys to work without knowing this data. I >> will try to get a user agent list for data from last month but, as I >> mentioned earlier, I think providing this data in a regular basis (monthly?) >> is a good goal for us. >> >> On Wed, Oct 15, 2014 at 10:35 AM, Jon Robson <[email protected]> >> wrote: >>> >>> Anything would be useful. I just hit this situation again. I was >>> reviewing some code and someone used JSON.stringify - this is not >>> available in Android < 2.3 and I have no idea what our traffic for >>> Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of >>> our traffic. >>> >>> In the mean time while I don't have a fancy place to find out the >>> answers to this how can I get these answers? >>> Should I mail the analytics mailing list to ask these questions? Cc a >>> point person on bugzilla with the question? Ping someone privately? >>> >>> Jon >>> >>> >>> >>> On Tue, Oct 14, 2014 at 10:30 AM, Nuria Ruiz <[email protected]> wrote: >>> >>Woah! Nice :D How are definitions updates handled? >>> > Since we talked about this on IRC, restating here to keep the archives >>> > happy. >>> > We pull the ua parser jar from our archiva depot, an update will >>> > involve >>> > building a new jar, uploading it to archiva and updating our dependency >>> > file >>> > (pom.xml) to point to the newly updated version. >>> > >>> > >>> > >>> > On Fri, Oct 10, 2014 at 9:59 PM, Oliver Keyes <[email protected]> >>> > wrote: >>> >> >>> >> Woah! Nice :D How are definitions updates handled? >>> >> >>> >> On 10 October 2014 18:58, Nuria Ruiz <[email protected]> wrote: >>> >>> >>> >>> >1. A UDF for ua-parser or whatever we decide to use (this will >>> >>> > possibly >>> >>> > be necessary for pageviews, but not necessarily - it depends on our >>> >>> > >spider/automaton detection strategy) >>> >>> We got this one ready today: >>> >>> https://gerrit.wikimedia.org/r/#/c/166142/ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Oct 10, 2014 at 3:55 PM, Oliver Keyes <[email protected]> >>> >>> wrote: >>> >>>> >>> >>>> >>> >>>> >>> >>>> On 10 October 2014 16:02, Nuria Ruiz <[email protected]> wrote: >>> >>>>> >>> >>>>> >At some point I believe we hope to just, you know. Have a >>> >>>>> > regularly >>> >>>>> > updated browser matrix somewhere. >>> >>>>> I REALLY think this should make it into our goals, if it cannot be >>> >>>>> done >>> >>>>> this quarter it should for sure be done this quarter. >>> >>>>> >>> >>>> >>> >>>> I agree it would be nice. It's one of those things that will either >>> >>>> come >>> >>>> as a side-effect of other stuff, OR require subsantially more work, >>> >>>> and >>> >>>> nothing in-between. Things we need for it: >>> >>>> >>> >>>> 1. A UDF for ua-parser or whatever we decide to use (this will >>> >>>> possibly >>> >>>> be necessary for pageviews, but not necessarily - it depends on our >>> >>>> spider/automaton detection strategy) >>> >>>> 2. Pageviews data >>> >>>> 3. A table somewhere. >>> >>>> >>> >>>> Take 1, apply to 2, stick in 3. Maybe grab the same data for >>> >>>> text/html >>> >>>> requests overall (depends on query runtime), maybe don't. >>> >>>> >>> >>>> The ideal implementation, obviously, is to pair this up with a site >>> >>>> that >>> >>>> automatically parses the results into HTML. That should be the end >>> >>>> goal. but >>> >>>> in terms of engineering support we can get most of the way there >>> >>>> simply by >>> >>>> ensuring we always have a recent snapshot to hand. I can probably >>> >>>> put >>> >>>> something together over the sampled logs and throw it in SQL if >>> >>>> there are >>> >>>> urgent needs. >>> >>>> >>> >>>>> >>> >>>>> Do we not have more recent data than May? >>> >>>> >>> >>>> >>> >>>> We don't, but thanks to the utilities library I built, the code for >>> >>>> generating it would literally run: >>> >>>> >>> >>>> library(WMUtils) >>> >>>> uas <- >>> >>>> >>> >>>> as.data.table(ua_parse(data_sieve(do.call("rbind",lapply(seq(20140901,20140930,1),sampled_logs)))$user_agent)) >>> >>>> >>> >>>> uas <- uas[,j = list(requests = .N, by = c("os","browser")] >>> >>>> >>> >>>> write.table(uas, file = uas_for_jon.tsv, sep = "\t", row.names = >>> >>>> FALSE, >>> >>>> quote = TRUE) >>> >>>> >>> >>>> ...assuming we didn't care about readability. >>> >>>> >>> >>>> Point is, in the time until we have the new parser built into Hadoop >>> >>>> and >>> >>>> that setup, we can totally generate interim data from the sampled >>> >>>> logs using >>> >>>> the same parser at a tiny cost in research/programming time, iff >>> >>>> (the >>> >>>> mathematical if) we need it enough that we're cool with the >>> >>>> sampling, and >>> >>>> people can convince [[Dario|Our Great Leader]] to authorise me to >>> >>>> spend 15 >>> >>>> minutes of my time on it. >>> >>>> >>> >>>>> >>> >>>>> >>> >>>>> On Fri, Oct 10, 2014 at 12:45 PM, Oliver Keyes >>> >>>>> <[email protected]> >>> >>>>> wrote: >>> >>>>>> >>> >>>>>> Email Dario and I, if he prioritises it I'll run a check on more >>> >>>>>> recent data. >>> >>>>>> >>> >>>>>> At some point I believe we hope to just, you know. Have a >>> >>>>>> regularly >>> >>>>>> updated browser matrix somewhere. This comes some time after >>> >>>>>> pageviews >>> >>>>>> though. >>> >>>>>> >>> >>>>>> On 10 October 2014 14:38, Toby Negrin <[email protected]> >>> >>>>>> wrote: >>> >>>>>>> >>> >>>>>>> Hi Jon -- I'm sure other folks will have more information but >>> >>>>>>> here's >>> >>>>>>> a link to a slide with some data from May[1]. We don't see a lot >>> >>>>>>> of Windows >>> >>>>>>> phone traffic. >>> >>>>>>> >>> >>>>>>> -Toby >>> >>>>>>> >>> >>>>>>> [1] >>> >>>>>>> >>> >>>>>>> https://docs.google.com/a/wikimedia.org/presentation/d/19tZgTi6VUG04wfGWVzcaZKY26oQiXhPaHI9g2tBmMKE/edit#slide=id.g382406373_08 >>> >>>>>>> >>> >>>>>>> On Fri, Oct 10, 2014 at 11:17 AM, Jon Robson >>> >>>>>>> <[email protected]> >>> >>>>>>> wrote: >>> >>>>>>>> >>> >>>>>>>> I was going through our backlog again today, and I noticed a bug >>> >>>>>>>> about >>> >>>>>>>> supporting editing on Windows Phones with IE9 [1] >>> >>>>>>>> >>> >>>>>>>> Yet again, I wondered 'how many of our users are using IE9' as I >>> >>>>>>>> wondered if because of this lack of support we are losing out on >>> >>>>>>>> lots >>> >>>>>>>> of potential editors. >>> >>>>>>>> >>> >>>>>>>> What's the easiest way to get this information now? Is it >>> >>>>>>>> available? >>> >>>>>>>> >>> >>>>>>>> [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=55599 >>> >>>>>>>> >>> >>>>>>>> _______________________________________________ >>> >>>>>>>> Analytics mailing list >>> >>>>>>>> [email protected] >>> >>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>>>> >>> >>>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> -- >>> >>>>>> Oliver Keyes >>> >>>>>> Research Analyst >>> >>>>>> Wikimedia Foundation >>> >>>>>> >>> >>>>>> _______________________________________________ >>> >>>>>> Analytics mailing list >>> >>>>>> [email protected] >>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>>> >>> >>>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Oliver Keyes >>> >>>> Research Analyst >>> >>>> Wikimedia Foundation >>> >>> >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Oliver Keyes >>> >> Research Analyst >>> >> Wikimedia Foundation >>> > >>> > >> >> > _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
