That's actually a pretty fascinating use case. Our experience on this side of the Atlantic is that few people really care about lineage.
On Thu, Jan 30, 2020 at 9:48 AM [email protected] <[email protected]> wrote: > I think you have the wrong picture. > > Data lineage systems like Atlas and similar are pushed because GDPR > prescribes it! > Data Lineage is by no means a pure "internal diagnostic" but has a legal > background. > > Thus GDPR defines a recording requirement. > It states among other things that > - a description of the categories of personal data > - a description of the categories of recipients of personal data, > including recipients in third countries or international organisations > Transfer of personal data to a third country or an international > organisation > - be recorded in an audit-proof manner. > > And if you do all this correctly, then you have to make sure that the > data is erasable again (right to be forgotten). > > By the way, this does not only apply to special Data Lineage systems but > also to all log files, backups etc. At least as long as no other legal > regulation prohibits this. > Data Lineage is therefore not a nice feature for internal diagnostics > but a must. > > So far, too few companies have thought of this. But more and more are > recognizing the necessity. > This is also the reason why formerly Hortonworks and now Cloudera work > hard on Atlas. > > Am 30.01.2020 um 15:25 schrieb Mike Thomsen: > > IANAL, but I would be surprised if NiFi provenance data even legally > falls > > under the Right to Be Forgotten because it's internal diagnostic data > that > > is highly ephemeral. > > > > On Thu, Jan 30, 2020 at 9:07 AM Emanuel Oliveira <[email protected]> > wrote: > > > >> Hi, dont think makes sense an api for atomic records: > >> > >> 1. one configure retention od data provenance (default 24h is "good > >> enough" GDPR doesnt need milisecond realtime deletion right ?) > >> > >> > https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties > >> 2. even if there would be one api to delete FF's with an attribute = > >> <some id>, that would normally be useless as well, since inbound FFs > >> have > >> normally hundreds, thousands of records that will need to split, > >> aggregate, > >> in complex flow file, implementing a clean up an nano atomic level > >> would be > >> to hard and extra effort not needed, since your target single record > >> would > >> surely be part of multiple FF UUIDs, some only holding your record, > but > >> mot > >> surefly will have 100s, 100s of other records including your record > >> somewhere on the middle. > >> > >> > >> In my opinion your answer to business/management gate keepers is that > data > >> will be stored on data provenance for 24h (default) which can be > >> configured, and that > >> > >> > >> Best Regards, > >> *Emanuel Oliveira* > >> > >> > >> > >> On Thu, Jan 30, 2020 at 1:54 PM [email protected] <[email protected]> > >> wrote: > >> > >>> Dear NiFi developer team, > >>> > >>> NiFi's Data Provenance and Data Lineage is perfectly adequate in the > >>> environment of NiFi, so there is often no need to use Atlas. > >>> > >>> When using NiFi with customer data a problem arises. > >>> The problem is the GDPR requirement that a user has the right to be > >>> forgotten. Unfortunately, I can't find any API call or information on > >>> how to delete individual user data from the NiFi Provenance Repository > >>> based on a user-defined attribute and its defined characteristics. > >>> > >>> A delete request like "delete all data and dependencies where the > >>> attribute XYZ has the value 123" is currently not possible to my > >> knowledge. > >>> My questions are: > >>> Is this actually possible and how? And if not, is it planned? > >>> > >>> Thanks > >>> Uwe > >>> > >
