I think you have the wrong picture.

Data lineage systems like Atlas and similar are pushed because GDPR
prescribes it!
Data Lineage is by no means a pure "internal diagnostic" but has a legal
background.

Thus GDPR defines a recording requirement.
It states among other things that
- a description of the categories of personal data
- a description of the categories of recipients of personal data,
including recipients in third countries or international organisations
Transfer of personal data to a third country or an international
organisation
- be recorded in an audit-proof manner.

And if you do all this correctly, then you have to make sure that the
data is erasable again (right to be forgotten).

By the way, this does not only apply to special Data Lineage systems but
also to all log files, backups etc. At least as long as no other legal
regulation prohibits this.
Data Lineage is therefore not a nice feature for internal diagnostics
but a must.

So far, too few companies have thought of this. But more and more are
recognizing the necessity.
This is also the reason why formerly Hortonworks and now Cloudera work
hard on Atlas.

Am 30.01.2020 um 15:25 schrieb Mike Thomsen:
> IANAL, but I would be surprised if NiFi provenance data even legally falls
> under the Right to Be Forgotten because it's internal diagnostic data that
> is highly ephemeral.
>
> On Thu, Jan 30, 2020 at 9:07 AM Emanuel Oliveira <[email protected]> wrote:
>
>> Hi, dont think makes sense an api for atomic records:
>>
>>    1. one configure retention od data provenance (default 24h is "good
>>    enough" GDPR doesnt need milisecond realtime deletion right ?)
>>
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>>    2. even if there would be one api to delete FF's with an attribute =
>>    <some id>, that would normally be useless as well, since inbound FFs
>> have
>>    normally hundreds, thousands of records that will need to split,
>> aggregate,
>>    in complex flow file, implementing a clean up an nano atomic level
>> would be
>>    to hard and extra effort not needed, since your target single record
>> would
>>    surely be part of multiple FF UUIDs, some only holding your record, but
>> mot
>>    surefly will have 100s, 100s of other records including your record
>>    somewhere on the middle.
>>
>>
>> In my opinion your answer to business/management gate keepers is that data
>> will be stored on data provenance for 24h (default) which can be
>> configured, and that
>>
>>
>> Best Regards,
>> *Emanuel Oliveira*
>>
>>
>>
>> On Thu, Jan 30, 2020 at 1:54 PM [email protected] <[email protected]>
>> wrote:
>>
>>> Dear NiFi developer team,
>>>
>>> NiFi's Data Provenance and Data Lineage is perfectly adequate in the
>>> environment of NiFi, so there is often no need to use Atlas.
>>>
>>> When using NiFi with customer data a problem arises.
>>> The problem is the GDPR requirement that a user has the right to be
>>> forgotten. Unfortunately, I can't find any API call or information on
>>> how to delete individual user data from the NiFi Provenance Repository
>>> based on a user-defined attribute and its defined characteristics.
>>>
>>> A delete request like "delete all data and dependencies where the
>>> attribute XYZ has the value 123" is currently not possible to my
>> knowledge.
>>> My questions are:
>>> Is this actually possible and how? And if not, is it planned?
>>>
>>> Thanks
>>> Uwe
>>>

Reply via email to