>Basically, to capture only people who already have a Wikimedia-cookie, and
count those.
Ah, yes, now I get it.

Yes. We have done these calculations and they under report by quite a bit
cause you need two visits to wikipedia to have a cookie (cookie is set on
your first visit, sent back on the 2nd visit) so as you said you will miss
all 1-hit visits in a monthly period, for example. Whether this matters
depends on user's browsing patterns, it turns out that 1-hit visits make up
quite a significant part of our traffic.




On Fri, Apr 8, 2016 at 11:22 AM, Denny Vrandečić <[email protected]>
wrote:

> +Wikimedia Analytics <[email protected]>
>
> Thanks for pointing me to the list, I should have written there in the
> first place.
>
> Sorry, with "user agent" term, I didn't mean the actual user agent string,
> but rather what you are trying to express with "unique device" - i.e. the
> different browsers on a single mobile device. I should have just stayed
> with your terminology to make it less confusing.
>
> Basically, to capture only people who already have a Wikimedia-cookie, and
> count those. This would still underreport - as it would miss all that only
> came once - but not by too much, I'd think. Right now I am more worried
> about overreporting.
>
> I hope this is a bit clearer.
>
>
>
> On Fri, Apr 8, 2016 at 11:16 AM Nuria Ruiz <[email protected]> wrote:
>
>> Denny:
>>
>> Best list to ask these kinds of questions is analytics@ (cc-ed).
>>
>> >A minor question - could you also count the number of unique recurring
>> user agents per month? I.e. the number of visits that return and have a
>> still valid cookie (e.g. by >marking the cookie after the count).
>> mmm...Not sure what you mean by "recurring" as you can have thousands of
>> people with the same user agent, right? Think "everyone in Seattle with an
>> iPhone and the latest OS using Safari" . You can add other pieces of info
>> like IP, but in mobile and due to NAT-ing [1] that can also mean a group of
>> thousands of people. So it will always under-report heavily the number of
>> unique devices if you use "recurring user agents" as base for your main
>> calculation.
>>
>> Now, I might be missing something as your question is brief, maybe you
>> can elaborate a bit more ?
>>
>>
>> >I am worried that the current number, due to the freshness offset  might
>> be overreporting
>> Since the offset calculation takes IP into account when looking for
>> freshness and it only keeps devices having 1 event without cookies and 0
>> with cookies the calculation is likely to under-report in mobile, due to,
>> again, NAT-ing and user agents being shared among many devices. We see this
>> on our data as smaller offset numbers in mobile projects than desktop
>> projects. Now, this methodology might over report for a user that uses many
>> distinct IPS, same browser, does 1 request and clears cookies after every
>> session, now this is a far less often a common of a scenario.
>>
>> Hopefully this makes sense.
>>
>>
>> >Again, congratulations on the work! I am really happy to see the WMF
>> not being dependent on a commercial traffic numbers provider anymore!
>> Many thanks for reading!
>>
>>
>>
>>
>>  [1] https://en.wikipedia.org/wiki/Network_address_translation
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Apr 8, 2016 at 10:30 AM, Denny Vrandečić <[email protected]>
>> wrote:
>>
>>> Hi Nuria, Aaron,
>>>
>>> first congratulations on the Unique devices work! I am really impressed
>>> by the solution and the dataset. I am looking forward to the visualizations
>>> that will come out from this.
>>>
>>> A minor question - could you also count the number of unique recurring
>>> user agents per month? I.e. the number of visits that return and have a
>>> still valid cookie (e.g. by marking the cookie after the count).
>>>
>>> My reasoning is the following: knowing well that it would possibly
>>> further underreport the number of unique user agents, it would get rid of
>>> all user agents that clean their cookies out or that use some form of
>>> incognito mode. It would only count people who have been there, got a
>>> cookie, returned, and then we mark the cookie, and don't count them further
>>> until it expires.
>>>
>>> I am worried that the current number, due to the freshness offset [1],
>>> might be overreporting, and I do not agree fully with your reasoning in
>>> that page that this is OK. Counting only the recurring ones would clean
>>> that up, give a more reliable number, although it would potentially
>>> underreport the people who indeed only come once a month (a number I don't
>>> expect to be too large).
>>>
>>> It would be interesting to see these two numbers side by side.
>>>
>>> Again, congratulations on the work! I am really happy to see the WMF not
>>> being dependent on a commercial traffic numbers provider anymore!
>>>
>>> Cheers,
>>> Denny
>>>
>>>
>>> [1]
>>> https://wikitech.wikimedia.org/wiki/Analytics/Unique_Devices/Last_access_solution#How_big_of_a_percentage_does_the_offset_represent_from_the_total.3F
>>>
>>>
>>
>>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to