Thank you very much for the clarification.

On Fri, Apr 8, 2016, 15:47 Nuria Ruiz <[email protected]> wrote:

> We can share the data with you informally, but the gist of it is the plot
> you linked to
>
> https://wikitech.wikimedia.org/wiki/Analytics/Unique_Devices/Last_access_solution#How_big_of_a_percentage_does_the_offset_represent_from_the_total.3F
>
> For uniques monthly data as of these February these are numbers for
> English Wikipedia, offsets vary quite a bit according to projects and by
> nature of calculation samller in mobile than desktop.
>
> Couple data points below:
>
>                                 underestimate  offset         offset
> percentage
> en.m.wikipedia.org 314989256 74130374  ~20%
> en.wikipedia.org 181066391 74848500  ~30%
>
>
>
>
>
>
> On Fri, Apr 8, 2016 at 2:37 PM, Denny Vrandečić <[email protected]>
> wrote:
>
>> Yes, how big that part is, that is what I would be curious about.
>>
>> On Fri, Apr 8, 2016 at 11:32 AM Nuria Ruiz <[email protected]> wrote:
>>
>>> >Basically, to capture only people who already have a Wikimedia-cookie,
>>> and count those.
>>> Ah, yes, now I get it.
>>>
>>> Yes. We have done these calculations and they under report by quite a
>>> bit cause you need two visits to wikipedia to have a cookie (cookie is set
>>> on your first visit, sent back on the 2nd visit) so as you said you will
>>> miss all 1-hit visits in a monthly period, for example. Whether this
>>> matters depends on user's browsing patterns, it turns out that 1-hit visits
>>> make up quite a significant part of our traffic.
>>>
>>>
>>>
>>>
>>> On Fri, Apr 8, 2016 at 11:22 AM, Denny Vrandečić <[email protected]>
>>> wrote:
>>>
>>>> +Wikimedia Analytics <[email protected]>
>>>>
>>>> Thanks for pointing me to the list, I should have written there in the
>>>> first place.
>>>>
>>>> Sorry, with "user agent" term, I didn't mean the actual user agent
>>>> string, but rather what you are trying to express with "unique device" -
>>>> i.e. the different browsers on a single mobile device. I should have just
>>>> stayed with your terminology to make it less confusing.
>>>>
>>>> Basically, to capture only people who already have a Wikimedia-cookie,
>>>> and count those. This would still underreport - as it would miss all that
>>>> only came once - but not by too much, I'd think. Right now I am more
>>>> worried about overreporting.
>>>>
>>>> I hope this is a bit clearer.
>>>>
>>>>
>>>>
>>>> On Fri, Apr 8, 2016 at 11:16 AM Nuria Ruiz <[email protected]> wrote:
>>>>
>>>>> Denny:
>>>>>
>>>>> Best list to ask these kinds of questions is analytics@ (cc-ed).
>>>>>
>>>>> >A minor question - could you also count the number of unique
>>>>> recurring user agents per month? I.e. the number of visits that return and
>>>>> have a still valid cookie (e.g. by >marking the cookie after the count).
>>>>> mmm...Not sure what you mean by "recurring" as you can have thousands
>>>>> of people with the same user agent, right? Think "everyone in Seattle with
>>>>> an iPhone and the latest OS using Safari" . You can add other pieces of
>>>>> info like IP, but in mobile and due to NAT-ing [1] that can also mean a
>>>>> group of thousands of people. So it will always under-report heavily the
>>>>> number of unique devices if you use "recurring user agents" as base for
>>>>> your main calculation.
>>>>>
>>>>> Now, I might be missing something as your question is brief, maybe you
>>>>> can elaborate a bit more ?
>>>>>
>>>>>
>>>>> >I am worried that the current number, due to the freshness offset
>>>>>  might be overreporting
>>>>> Since the offset calculation takes IP into account when looking for
>>>>> freshness and it only keeps devices having 1 event without cookies and 0
>>>>> with cookies the calculation is likely to under-report in mobile, due to,
>>>>> again, NAT-ing and user agents being shared among many devices. We see 
>>>>> this
>>>>> on our data as smaller offset numbers in mobile projects than desktop
>>>>> projects. Now, this methodology might over report for a user that uses 
>>>>> many
>>>>> distinct IPS, same browser, does 1 request and clears cookies after every
>>>>> session, now this is a far less often a common of a scenario.
>>>>>
>>>>> Hopefully this makes sense.
>>>>>
>>>>>
>>>>> >Again, congratulations on the work! I am really happy to see the WMF
>>>>> not being dependent on a commercial traffic numbers provider anymore!
>>>>> Many thanks for reading!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  [1] https://en.wikipedia.org/wiki/Network_address_translation
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 8, 2016 at 10:30 AM, Denny Vrandečić <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Nuria, Aaron,
>>>>>>
>>>>>> first congratulations on the Unique devices work! I am really
>>>>>> impressed by the solution and the dataset. I am looking forward to the
>>>>>> visualizations that will come out from this.
>>>>>>
>>>>>> A minor question - could you also count the number of unique
>>>>>> recurring user agents per month? I.e. the number of visits that return 
>>>>>> and
>>>>>> have a still valid cookie (e.g. by marking the cookie after the count).
>>>>>>
>>>>>> My reasoning is the following: knowing well that it would possibly
>>>>>> further underreport the number of unique user agents, it would get rid of
>>>>>> all user agents that clean their cookies out or that use some form of
>>>>>> incognito mode. It would only count people who have been there, got a
>>>>>> cookie, returned, and then we mark the cookie, and don't count them 
>>>>>> further
>>>>>> until it expires.
>>>>>>
>>>>>> I am worried that the current number, due to the freshness offset
>>>>>> [1], might be overreporting, and I do not agree fully with your reasoning
>>>>>> in that page that this is OK. Counting only the recurring ones would 
>>>>>> clean
>>>>>> that up, give a more reliable number, although it would potentially
>>>>>> underreport the people who indeed only come once a month (a number I 
>>>>>> don't
>>>>>> expect to be too large).
>>>>>>
>>>>>> It would be interesting to see these two numbers side by side.
>>>>>>
>>>>>> Again, congratulations on the work! I am really happy to see the WMF
>>>>>> not being dependent on a commercial traffic numbers provider anymore!
>>>>>>
>>>>>> Cheers,
>>>>>> Denny
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://wikitech.wikimedia.org/wiki/Analytics/Unique_Devices/Last_access_solution#How_big_of_a_percentage_does_the_offset_represent_from_the_total.3F
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to