I second Leila's question. The issue of how we flag PII data and ensure it's appropriately scrubbed came up in our team meeting yesterday. We're discussing team practices for data/project backups tomorrow and plan to come out with some proposals, at least for the short term.
Are there any existing processes or guidelines I should be aware of? Thanks! Kate -- Kate Zimmerman (she/they) Head of Product Analytics Wikimedia Foundation On Wed, Jul 10, 2019 at 9:00 AM Leila Zia <[email protected]> wrote: > Hi Luca, > > Thanks for the heads up. Isaac is coordinating a response from the > Research side. > > I have one question for you: As you allow/encourage for more copies of > the files to exist, what is the mechanism you'd like to put in place > for reducing the chances of PII to be copied in new folders that then > will be even harder (for your team) to keep track of? Having an > explicit process/understanding about this will be very helpful. > > Thanks, > Leila > > > On Thu, Jul 4, 2019 at 3:14 AM Luca Toscano <[email protected]> > wrote: > > > > Hi everybody, > > > > as part of https://phabricator.wikimedia.org/T201165 the Analytics team > > thought to reach out to everybody to make it clear that all the home > > directories on the stat/notebook nodes are not backed up periodically. > They > > run on a software RAID configuration spanning multiple disks of course, > so > > we are resilient on a disk failure, but even if unlikely if might happen > > that a host could loose all its data. Please keep this in mind when > working > > on important projects and/or handling important data that you care about. > > > > I just added a warning to > > > https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Analytics_clients > . > > If you have really important data that is too big to backup, keep in mind > > that you can use your home directory (/user/your-username) on HDFS (that > > replicates data three times across multiple nodes). > > > > Please let us know if you have comments/suggestions/etc.. in the > > aforementioned task. > > > > Thanks in advance! > > > > Luca (on behalf of the Analytics team) > > _______________________________________________ > > Wiki-research-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
