Thank you very much for the clarification. On Fri, Apr 8, 2016, 15:47 Nuria Ruiz <[email protected]> wrote:
> We can share the data with you informally, but the gist of it is the plot > you linked to > > https://wikitech.wikimedia.org/wiki/Analytics/Unique_Devices/Last_access_solution#How_big_of_a_percentage_does_the_offset_represent_from_the_total.3F > > For uniques monthly data as of these February these are numbers for > English Wikipedia, offsets vary quite a bit according to projects and by > nature of calculation samller in mobile than desktop. > > Couple data points below: > > underestimate offset offset > percentage > en.m.wikipedia.org 314989256 74130374 ~20% > en.wikipedia.org 181066391 74848500 ~30% > > > > > > > On Fri, Apr 8, 2016 at 2:37 PM, Denny Vrandečić <[email protected]> > wrote: > >> Yes, how big that part is, that is what I would be curious about. >> >> On Fri, Apr 8, 2016 at 11:32 AM Nuria Ruiz <[email protected]> wrote: >> >>> >Basically, to capture only people who already have a Wikimedia-cookie, >>> and count those. >>> Ah, yes, now I get it. >>> >>> Yes. We have done these calculations and they under report by quite a >>> bit cause you need two visits to wikipedia to have a cookie (cookie is set >>> on your first visit, sent back on the 2nd visit) so as you said you will >>> miss all 1-hit visits in a monthly period, for example. Whether this >>> matters depends on user's browsing patterns, it turns out that 1-hit visits >>> make up quite a significant part of our traffic. >>> >>> >>> >>> >>> On Fri, Apr 8, 2016 at 11:22 AM, Denny Vrandečić <[email protected]> >>> wrote: >>> >>>> +Wikimedia Analytics <[email protected]> >>>> >>>> Thanks for pointing me to the list, I should have written there in the >>>> first place. >>>> >>>> Sorry, with "user agent" term, I didn't mean the actual user agent >>>> string, but rather what you are trying to express with "unique device" - >>>> i.e. the different browsers on a single mobile device. I should have just >>>> stayed with your terminology to make it less confusing. >>>> >>>> Basically, to capture only people who already have a Wikimedia-cookie, >>>> and count those. This would still underreport - as it would miss all that >>>> only came once - but not by too much, I'd think. Right now I am more >>>> worried about overreporting. >>>> >>>> I hope this is a bit clearer. >>>> >>>> >>>> >>>> On Fri, Apr 8, 2016 at 11:16 AM Nuria Ruiz <[email protected]> wrote: >>>> >>>>> Denny: >>>>> >>>>> Best list to ask these kinds of questions is analytics@ (cc-ed). >>>>> >>>>> >A minor question - could you also count the number of unique >>>>> recurring user agents per month? I.e. the number of visits that return and >>>>> have a still valid cookie (e.g. by >marking the cookie after the count). >>>>> mmm...Not sure what you mean by "recurring" as you can have thousands >>>>> of people with the same user agent, right? Think "everyone in Seattle with >>>>> an iPhone and the latest OS using Safari" . You can add other pieces of >>>>> info like IP, but in mobile and due to NAT-ing [1] that can also mean a >>>>> group of thousands of people. So it will always under-report heavily the >>>>> number of unique devices if you use "recurring user agents" as base for >>>>> your main calculation. >>>>> >>>>> Now, I might be missing something as your question is brief, maybe you >>>>> can elaborate a bit more ? >>>>> >>>>> >>>>> >I am worried that the current number, due to the freshness offset >>>>> might be overreporting >>>>> Since the offset calculation takes IP into account when looking for >>>>> freshness and it only keeps devices having 1 event without cookies and 0 >>>>> with cookies the calculation is likely to under-report in mobile, due to, >>>>> again, NAT-ing and user agents being shared among many devices. We see >>>>> this >>>>> on our data as smaller offset numbers in mobile projects than desktop >>>>> projects. Now, this methodology might over report for a user that uses >>>>> many >>>>> distinct IPS, same browser, does 1 request and clears cookies after every >>>>> session, now this is a far less often a common of a scenario. >>>>> >>>>> Hopefully this makes sense. >>>>> >>>>> >>>>> >Again, congratulations on the work! I am really happy to see the WMF >>>>> not being dependent on a commercial traffic numbers provider anymore! >>>>> Many thanks for reading! >>>>> >>>>> >>>>> >>>>> >>>>> [1] https://en.wikipedia.org/wiki/Network_address_translation >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Apr 8, 2016 at 10:30 AM, Denny Vrandečić <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Nuria, Aaron, >>>>>> >>>>>> first congratulations on the Unique devices work! I am really >>>>>> impressed by the solution and the dataset. I am looking forward to the >>>>>> visualizations that will come out from this. >>>>>> >>>>>> A minor question - could you also count the number of unique >>>>>> recurring user agents per month? I.e. the number of visits that return >>>>>> and >>>>>> have a still valid cookie (e.g. by marking the cookie after the count). >>>>>> >>>>>> My reasoning is the following: knowing well that it would possibly >>>>>> further underreport the number of unique user agents, it would get rid of >>>>>> all user agents that clean their cookies out or that use some form of >>>>>> incognito mode. It would only count people who have been there, got a >>>>>> cookie, returned, and then we mark the cookie, and don't count them >>>>>> further >>>>>> until it expires. >>>>>> >>>>>> I am worried that the current number, due to the freshness offset >>>>>> [1], might be overreporting, and I do not agree fully with your reasoning >>>>>> in that page that this is OK. Counting only the recurring ones would >>>>>> clean >>>>>> that up, give a more reliable number, although it would potentially >>>>>> underreport the people who indeed only come once a month (a number I >>>>>> don't >>>>>> expect to be too large). >>>>>> >>>>>> It would be interesting to see these two numbers side by side. >>>>>> >>>>>> Again, congratulations on the work! I am really happy to see the WMF >>>>>> not being dependent on a commercial traffic numbers provider anymore! >>>>>> >>>>>> Cheers, >>>>>> Denny >>>>>> >>>>>> >>>>>> [1] >>>>>> https://wikitech.wikimedia.org/wiki/Analytics/Unique_Devices/Last_access_solution#How_big_of_a_percentage_does_the_offset_represent_from_the_total.3F >>>>>> >>>>>> >>>>> >>>>> >>> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
