I second Leila's question. The issue of how we flag PII data and ensure
it's appropriately scrubbed came up in our team meeting yesterday. We're
discussing team practices for data/project backups tomorrow and plan to
come out with some proposals, at least for the short term.

Are there any existing processes or guidelines I should be aware of?

Thanks!
Kate

--

Kate Zimmerman (she/they)
Head of Product Analytics
Wikimedia Foundation


On Wed, Jul 10, 2019 at 9:00 AM Leila Zia <[email protected]> wrote:

> Hi Luca,
>
> Thanks for the heads up. Isaac is coordinating a response from the
> Research side.
>
> I have one question for you: As you allow/encourage for more copies of
> the files to exist, what is the mechanism you'd like to put in place
> for reducing the chances of PII to be copied in new folders that then
> will be even harder (for your team) to keep track of? Having an
> explicit process/understanding about this will be very helpful.
>
> Thanks,
> Leila
>
>
> On Thu, Jul 4, 2019 at 3:14 AM Luca Toscano <[email protected]>
> wrote:
> >
> > Hi everybody,
> >
> > as part of https://phabricator.wikimedia.org/T201165 the Analytics team
> > thought to reach out to everybody to make it clear that all the home
> > directories on the stat/notebook nodes are not backed up periodically.
> They
> > run on a software RAID configuration spanning multiple disks of course,
> so
> > we are resilient on a disk failure, but even if unlikely if might happen
> > that a host could loose all its data. Please keep this in mind when
> working
> > on important projects and/or handling important data that you care about.
> >
> > I just added a warning to
> >
> https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Analytics_clients
> .
> > If you have really important data that is too big to backup, keep in mind
> > that you can use your home directory (/user/your-username) on HDFS (that
> > replicates data three times across multiple nodes).
> >
> > Please let us know if you have comments/suggestions/etc.. in the
> > aforementioned task.
> >
> > Thanks in advance!
> >
> > Luca (on behalf of the Analytics team)
> > _______________________________________________
> > Wiki-research-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to