Ok, I just want to make sure that actions taken (with good reason) to
protect privacy, don't have a side effect of hindering checkusers' access
to the user agent data. From the discussion here, I'm inferring that there
should be no impact to CU, which I'd good. Thanks.

Pine
On Sep 28, 2015 12:55 PM, "Tilman Bayer" <[email protected]> wrote:

> Hi Pine,
>
> this mailing list is about analytics, and this thread is about the
> pageview_hourly dataset. There might better venues for your questions
> about the Checkuser extension (not an analytics tool per se) and the
> data that it stores. Perhaps start by first reading the relevant parts
> of https://www.mediawiki.org/wiki/Extension:CheckUser or
> https://meta.wikimedia.org/wiki/Help:CheckUser#Information_returned .
> As far as I know, there have been no changes there recently.
>
> On Mon, Sep 28, 2015 at 10:52 AM, Pine W <[email protected]> wrote:
> > Hi Nuria,
> >
> > OK, so the useragent data for edits is stored in a different database, is
> > heavily sampled when used for research, and will still be accessible for
> CU
> > use if user_agent_map  is removed from the pageview_hourly data, right?
> >
> > On Mon, Sep 28, 2015 at 10:48 AM, Nuria Ruiz <[email protected]>
> wrote:
> >>
> >> Pine:
> >>
> >> The pageview_hourly dataset on hive contains pageviews, not edits.
> >>
> >> The majority of data for edits is not associated to a user-agent as it
> is
> >> stored on mediawiki database. Some of it comes via Eventlogging as
> >> experiments are run in, for example, visual editor. This second venue of
> >> data is of a very different nature than the one we just run this test
> on, it
> >> is heavily sampled, not public, and will be purged every 90 days.
> >>
> >>
> https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Data_retention_and_auto-purging
> >>
> >>
> >> Thanks,
> >>
> >> Nuria
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Sep 28, 2015 at 7:23 AM, Pine W <[email protected]> wrote:
> >>>
> >>> Hi Nuria,
> >>>
> >>> Thanks for wirking on this.
> >>>
> >>> Removing user_agent_map would be only for readership data, correct?
> Would
> >>> this data still be stored for edits, and if so, for how long?
> >>>
> >>> Pine
> >>>
> >>> On Sep 28, 2015 7:16 AM, "Nuria Ruiz" <[email protected]> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> We have been working on the exercise of reconstructing an identity
> using
> >>>> the (still private) pageview_hourly dataset
> >>>> (https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly)
> >>>>
> >>>> TL;DR
> >>>> It is possible (and easy) to do that with the fields the dataset has
> >>>> now, before releasing it publicly we need to further anonymize it.
> >>>>
> >>>> More info here:
> >>>>
> >>>>
> https://wikitech.wikimedia.org/wiki/Analytics/Data/PreventingIdentityReconstruction
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Nuria
> >>>>
> >>>> _______________________________________________
> >>>> Analytics mailing list
> >>>> [email protected]
> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> [email protected]
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Tilman Bayer
> Senior Analyst
> Wikimedia Foundation
> IRC (Freenode): HaeB
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to