Pine: The pageview_hourly dataset on hive contains pageviews, not edits.
The majority of data for edits is not associated to a user-agent as it is stored on mediawiki database. Some of it comes via Eventlogging as experiments are run in, for example, visual editor. This second venue of data is of a very different nature than the one we just run this test on, it is heavily sampled, not public, and will be purged every 90 days. https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Data_retention_and_auto-purging Thanks, Nuria On Mon, Sep 28, 2015 at 7:23 AM, Pine W <[email protected]> wrote: > Hi Nuria, > > Thanks for wirking on this. > > Removing user_agent_map would be only for readership data, correct? Would > this data still be stored for edits, and if so, for how long? > > Pine > On Sep 28, 2015 7:16 AM, "Nuria Ruiz" <[email protected]> wrote: > >> Hello, >> >> We have been working on the exercise of reconstructing an identity using >> the (still private) pageview_hourly dataset ( >> https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly) >> >> TL;DR >> It is possible (and easy) to do that with the fields the dataset has now, >> before releasing it publicly we need to further anonymize it. >> >> More info here: >> >> https://wikitech.wikimedia.org/wiki/Analytics/Data/PreventingIdentityReconstruction >> >> Thanks, >> >> Nuria >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
