Hi Nuria,

OK, so the useragent data for edits is stored in a different database, is
heavily sampled when used for research, and will still be accessible for CU
use if user_agent_map  is removed from the pageview_hourly data, right?

On Mon, Sep 28, 2015 at 10:48 AM, Nuria Ruiz <[email protected]> wrote:

> Pine:
>
> The pageview_hourly dataset on hive contains pageviews, not edits.
>
> The majority of data for edits is not associated to a user-agent as it is
> stored on mediawiki database. Some of it comes via Eventlogging as
> experiments are run in, for example, visual editor. This second venue of
> data is of a very different nature than the one we just run this test on,
> it is heavily sampled, not public, and will be purged every 90 days.
>
> https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Data_retention_and_auto-purging
>
>
> Thanks,
>
> Nuria
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Sep 28, 2015 at 7:23 AM, Pine W <[email protected]> wrote:
>
>> Hi Nuria,
>>
>> Thanks for wirking on this.
>>
>> Removing user_agent_map would be only for readership data, correct? Would
>> this data still be stored for edits, and if so, for how long?
>>
>> Pine
>> On Sep 28, 2015 7:16 AM, "Nuria Ruiz" <[email protected]> wrote:
>>
>>> Hello,
>>>
>>> We have been working on the exercise of reconstructing an identity using
>>> the (still private) pageview_hourly dataset (
>>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly)
>>>
>>> TL;DR
>>> It is possible (and easy) to do that with the fields the dataset has
>>> now, before releasing it publicly we need to further anonymize it.
>>>
>>> More info here:
>>>
>>> https://wikitech.wikimedia.org/wiki/Analytics/Data/PreventingIdentityReconstruction
>>>
>>> Thanks,
>>>
>>> Nuria
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to