Yes, how big that part is, that is what I would be curious about. On Fri, Apr 8, 2016 at 11:32 AM Nuria Ruiz <[email protected]> wrote:
> >Basically, to capture only people who already have a Wikimedia-cookie, > and count those. > Ah, yes, now I get it. > > Yes. We have done these calculations and they under report by quite a bit > cause you need two visits to wikipedia to have a cookie (cookie is set on > your first visit, sent back on the 2nd visit) so as you said you will miss > all 1-hit visits in a monthly period, for example. Whether this matters > depends on user's browsing patterns, it turns out that 1-hit visits make up > quite a significant part of our traffic. > > > > > On Fri, Apr 8, 2016 at 11:22 AM, Denny Vrandečić <[email protected]> > wrote: > >> +Wikimedia Analytics <[email protected]> >> >> Thanks for pointing me to the list, I should have written there in the >> first place. >> >> Sorry, with "user agent" term, I didn't mean the actual user agent >> string, but rather what you are trying to express with "unique device" - >> i.e. the different browsers on a single mobile device. I should have just >> stayed with your terminology to make it less confusing. >> >> Basically, to capture only people who already have a Wikimedia-cookie, >> and count those. This would still underreport - as it would miss all that >> only came once - but not by too much, I'd think. Right now I am more >> worried about overreporting. >> >> I hope this is a bit clearer. >> >> >> >> On Fri, Apr 8, 2016 at 11:16 AM Nuria Ruiz <[email protected]> wrote: >> >>> Denny: >>> >>> Best list to ask these kinds of questions is analytics@ (cc-ed). >>> >>> >A minor question - could you also count the number of unique recurring >>> user agents per month? I.e. the number of visits that return and have a >>> still valid cookie (e.g. by >marking the cookie after the count). >>> mmm...Not sure what you mean by "recurring" as you can have thousands of >>> people with the same user agent, right? Think "everyone in Seattle with an >>> iPhone and the latest OS using Safari" . You can add other pieces of info >>> like IP, but in mobile and due to NAT-ing [1] that can also mean a group of >>> thousands of people. So it will always under-report heavily the number of >>> unique devices if you use "recurring user agents" as base for your main >>> calculation. >>> >>> Now, I might be missing something as your question is brief, maybe you >>> can elaborate a bit more ? >>> >>> >>> >I am worried that the current number, due to the freshness offset >>> might be overreporting >>> Since the offset calculation takes IP into account when looking for >>> freshness and it only keeps devices having 1 event without cookies and 0 >>> with cookies the calculation is likely to under-report in mobile, due to, >>> again, NAT-ing and user agents being shared among many devices. We see this >>> on our data as smaller offset numbers in mobile projects than desktop >>> projects. Now, this methodology might over report for a user that uses many >>> distinct IPS, same browser, does 1 request and clears cookies after every >>> session, now this is a far less often a common of a scenario. >>> >>> Hopefully this makes sense. >>> >>> >>> >Again, congratulations on the work! I am really happy to see the WMF >>> not being dependent on a commercial traffic numbers provider anymore! >>> Many thanks for reading! >>> >>> >>> >>> >>> [1] https://en.wikipedia.org/wiki/Network_address_translation >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Apr 8, 2016 at 10:30 AM, Denny Vrandečić <[email protected]> >>> wrote: >>> >>>> Hi Nuria, Aaron, >>>> >>>> first congratulations on the Unique devices work! I am really impressed >>>> by the solution and the dataset. I am looking forward to the visualizations >>>> that will come out from this. >>>> >>>> A minor question - could you also count the number of unique recurring >>>> user agents per month? I.e. the number of visits that return and have a >>>> still valid cookie (e.g. by marking the cookie after the count). >>>> >>>> My reasoning is the following: knowing well that it would possibly >>>> further underreport the number of unique user agents, it would get rid of >>>> all user agents that clean their cookies out or that use some form of >>>> incognito mode. It would only count people who have been there, got a >>>> cookie, returned, and then we mark the cookie, and don't count them further >>>> until it expires. >>>> >>>> I am worried that the current number, due to the freshness offset [1], >>>> might be overreporting, and I do not agree fully with your reasoning in >>>> that page that this is OK. Counting only the recurring ones would clean >>>> that up, give a more reliable number, although it would potentially >>>> underreport the people who indeed only come once a month (a number I don't >>>> expect to be too large). >>>> >>>> It would be interesting to see these two numbers side by side. >>>> >>>> Again, congratulations on the work! I am really happy to see the WMF >>>> not being dependent on a commercial traffic numbers provider anymore! >>>> >>>> Cheers, >>>> Denny >>>> >>>> >>>> [1] >>>> https://wikitech.wikimedia.org/wiki/Analytics/Unique_Devices/Last_access_solution#How_big_of_a_percentage_does_the_offset_represent_from_the_total.3F >>>> >>>> >>> >>> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
