Hi Nuria, OK, so the useragent data for edits is stored in a different database, is heavily sampled when used for research, and will still be accessible for CU use if user_agent_map is removed from the pageview_hourly data, right?
On Mon, Sep 28, 2015 at 10:48 AM, Nuria Ruiz <[email protected]> wrote: > Pine: > > The pageview_hourly dataset on hive contains pageviews, not edits. > > The majority of data for edits is not associated to a user-agent as it is > stored on mediawiki database. Some of it comes via Eventlogging as > experiments are run in, for example, visual editor. This second venue of > data is of a very different nature than the one we just run this test on, > it is heavily sampled, not public, and will be purged every 90 days. > > https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Data_retention_and_auto-purging > > > Thanks, > > Nuria > > > > > > > > > > > > > On Mon, Sep 28, 2015 at 7:23 AM, Pine W <[email protected]> wrote: > >> Hi Nuria, >> >> Thanks for wirking on this. >> >> Removing user_agent_map would be only for readership data, correct? Would >> this data still be stored for edits, and if so, for how long? >> >> Pine >> On Sep 28, 2015 7:16 AM, "Nuria Ruiz" <[email protected]> wrote: >> >>> Hello, >>> >>> We have been working on the exercise of reconstructing an identity using >>> the (still private) pageview_hourly dataset ( >>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly) >>> >>> TL;DR >>> It is possible (and easy) to do that with the fields the dataset has >>> now, before releasing it publicly we need to further anonymize it. >>> >>> More info here: >>> >>> https://wikitech.wikimedia.org/wiki/Analytics/Data/PreventingIdentityReconstruction >>> >>> Thanks, >>> >>> Nuria >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
