That's actually a pretty fascinating use case. Our experience on this side
of the Atlantic is that few people really care about lineage.

On Thu, Jan 30, 2020 at 9:48 AM [email protected] <[email protected]>
wrote:

> I think you have the wrong picture.
>
> Data lineage systems like Atlas and similar are pushed because GDPR
> prescribes it!
> Data Lineage is by no means a pure "internal diagnostic" but has a legal
> background.
>
> Thus GDPR defines a recording requirement.
> It states among other things that
> - a description of the categories of personal data
> - a description of the categories of recipients of personal data,
> including recipients in third countries or international organisations
> Transfer of personal data to a third country or an international
> organisation
> - be recorded in an audit-proof manner.
>
> And if you do all this correctly, then you have to make sure that the
> data is erasable again (right to be forgotten).
>
> By the way, this does not only apply to special Data Lineage systems but
> also to all log files, backups etc. At least as long as no other legal
> regulation prohibits this.
> Data Lineage is therefore not a nice feature for internal diagnostics
> but a must.
>
> So far, too few companies have thought of this. But more and more are
> recognizing the necessity.
> This is also the reason why formerly Hortonworks and now Cloudera work
> hard on Atlas.
>
> Am 30.01.2020 um 15:25 schrieb Mike Thomsen:
> > IANAL, but I would be surprised if NiFi provenance data even legally
> falls
> > under the Right to Be Forgotten because it's internal diagnostic data
> that
> > is highly ephemeral.
> >
> > On Thu, Jan 30, 2020 at 9:07 AM Emanuel Oliveira <[email protected]>
> wrote:
> >
> >> Hi, dont think makes sense an api for atomic records:
> >>
> >>    1. one configure retention od data provenance (default 24h is "good
> >>    enough" GDPR doesnt need milisecond realtime deletion right ?)
> >>
> >>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
> >>    2. even if there would be one api to delete FF's with an attribute =
> >>    <some id>, that would normally be useless as well, since inbound FFs
> >> have
> >>    normally hundreds, thousands of records that will need to split,
> >> aggregate,
> >>    in complex flow file, implementing a clean up an nano atomic level
> >> would be
> >>    to hard and extra effort not needed, since your target single record
> >> would
> >>    surely be part of multiple FF UUIDs, some only holding your record,
> but
> >> mot
> >>    surefly will have 100s, 100s of other records including your record
> >>    somewhere on the middle.
> >>
> >>
> >> In my opinion your answer to business/management gate keepers is that
> data
> >> will be stored on data provenance for 24h (default) which can be
> >> configured, and that
> >>
> >>
> >> Best Regards,
> >> *Emanuel Oliveira*
> >>
> >>
> >>
> >> On Thu, Jan 30, 2020 at 1:54 PM [email protected] <[email protected]>
> >> wrote:
> >>
> >>> Dear NiFi developer team,
> >>>
> >>> NiFi's Data Provenance and Data Lineage is perfectly adequate in the
> >>> environment of NiFi, so there is often no need to use Atlas.
> >>>
> >>> When using NiFi with customer data a problem arises.
> >>> The problem is the GDPR requirement that a user has the right to be
> >>> forgotten. Unfortunately, I can't find any API call or information on
> >>> how to delete individual user data from the NiFi Provenance Repository
> >>> based on a user-defined attribute and its defined characteristics.
> >>>
> >>> A delete request like "delete all data and dependencies where the
> >>> attribute XYZ has the value 123" is currently not possible to my
> >> knowledge.
> >>> My questions are:
> >>> Is this actually possible and how? And if not, is it planned?
> >>>
> >>> Thanks
> >>> Uwe
> >>>
>
>

Reply via email to