Pine:

The pageview_hourly dataset on hive contains pageviews, not edits.

The majority of data for edits is not associated to a user-agent as it is
stored on mediawiki database. Some of it comes via Eventlogging as
experiments are run in, for example, visual editor. This second venue of
data is of a very different nature than the one we just run this test on,
it is heavily sampled, not public, and will be purged every 90 days.
https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Data_retention_and_auto-purging


Thanks,

Nuria












On Mon, Sep 28, 2015 at 7:23 AM, Pine W <[email protected]> wrote:

> Hi Nuria,
>
> Thanks for wirking on this.
>
> Removing user_agent_map would be only for readership data, correct? Would
> this data still be stored for edits, and if so, for how long?
>
> Pine
> On Sep 28, 2015 7:16 AM, "Nuria Ruiz" <[email protected]> wrote:
>
>> Hello,
>>
>> We have been working on the exercise of reconstructing an identity using
>> the (still private) pageview_hourly dataset (
>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly)
>>
>> TL;DR
>> It is possible (and easy) to do that with the fields the dataset has now,
>> before releasing it publicly we need to further anonymize it.
>>
>> More info here:
>>
>> https://wikitech.wikimedia.org/wiki/Analytics/Data/PreventingIdentityReconstruction
>>
>> Thanks,
>>
>> Nuria
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to