Sorry :-)

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 30.01.2020 um 23:08 schrieb Joe Witt <[email protected]>:
> 
> Our data provenance is.  Just not our repository :)
> 
>> On Thu, Jan 30, 2020 at 5:00 PM [email protected] <[email protected]>
>> wrote:
>> 
>> Lars
>> 
>> You're absolutely right about what you say.
>> If the data in the NiFi repositories is only stored temporarily for a
>> few hours, then documentation is quite sufficient.
>> 
>> The original question was how to delete data from the data lineage.
>> I assumed to use the NiFi repository as a full Data Lineage System.
>> If NiFi is your central application, then you could avoid having to
>> install Atlas as well. And with Atlas, you would have to install Ranger,
>> Cassandra or even Hadoop and HBase.
>> 
>> Joe has already made it clear to me here that Data Provenance/Data
>> Lineage of NiFi is not designed for this yet.
>> Maybe in the future...
>> 
>> Best
>> Uwe
>> 
>>> Am 30.01.2020 um 22:08 schrieb Lars Winderling:
>>> Dear Uwe and fellow devs,
>>> 
>>> sorry if I completely miss the point here, but I'll try. Also working
>> with NiFi under GDPR-regulations in online ad business. From my point it
>> would be sufficient to ensure that no new data will get stored, if a user
>> requests deletion, and delete all personal data from all respective
>> systems. The NiFi repos will expire their data, which can be argued to
>> equal a delayed deletion. Remember that GDPR is quite strict, but if you
>> have a proper case for this kind of process e.g. due to technical
>> limitations, it needs to be documented, and then it will likely be ok. We
>> do it similarly, and our legal counsel approved this. My response, however,
>> is not legally binding. The regulation says something like you should take
>> appropriate measures. If such a tool like NiFi just doesn't let you delete
>> temporarily stored data instantly, this may seem acceptable.
>>> 
>>> Best,
>>> Lars
>>> 
>>> Am 30. Januar 2020 21:36:31 MEZ schrieb Mike Thomsen <
>> [email protected]>:
>>>> I suppose the elephant in the room here is what sort of personal data
>>>> is
>>>> being stored in your provenance records? Can't you just refactor your
>>>> flows
>>>> to ensure that the provenance data doesn't meaningful contain anything
>>>> traceable to a person?
>>>> 
>>>> On Thu, Jan 30, 2020 at 12:41 PM [email protected]
>>>> <[email protected]>
>>>> wrote:
>>>> 
>>>>> Emanuel
>>>>> 
>>>>> That was not meant disrespectfully by me. And if that's how you felt,
>>>>> then I apologize.
>>>>> 
>>>>>> In what sense does NiFi relates to GDPR compliance ?
>>>>> All person-related data that flows, is read, sent or stored etc.  in
>>>> a
>>>>> company is GDPR relevant.
>>>>> 
>>>>>> - in terms of data FF contents - they too transient (gone in 12hours
>>>> /
>>>>> default).
>>>>> It makes no difference how long the data is stored. And it makes no
>>>>> difference if data is stored on disk or just in memory.
>>>>> 
>>>>> The data can potentially be read, processed by others or sent to
>>>> other
>>>>> systems and so on. Or the data can be used during this time to
>>>> establish
>>>>> relationships to other data (pseudo anonymized data etc.).
>>>>> 
>>>>>> I guess discussion is on the fact FF attributes are kept on the
>>>> data
>>>>>   provenance repo ? (gone in 24h / default)
>>>>> I'm afraid not. It's generally a matter of NiFi storing data - as
>>>>> already mentioned, it doesn't make any difference whether it's on the
>>>>> hard disk or just in memory.
>>>>> 
>>>>>> I wonder where the culprit here ?
>>>>> There's no culprit here. It's generally a problem with GDPR when
>>>>> processing person-related data.
>>>>> It's a problem of person-related data.
>>>>> It is a problem of person-related data, which would fill a book, what
>>>> is
>>>>> person-related, because machine data can also be person-related, for
>>>>> example if I can relate a person directly to the machine and
>>>> place/time.
>>>>> This would allow me to track a person/employee and this is not
>>>> allowed
>>>>> (unless a law allows me to do so).
>>>>> 
>>>>> All this goes much further and would be far too much to mention now.
>>>>> In principle, we have a GDPR issue and must act in accordance with
>>>> the law.
>>>>> We do not agree with all the regulation either. But all regulations I
>>>>> know so far have at least one justification. Even if we as enterprise
>>>>> architects, developers, administrators etc. have our problems with
>>>> them.
>>>>> Regards
>>>>> Uwe
>>>>> 
>>>>> Am 30.01.2020 um 17:51 schrieb Emanuel Oliveira:
>>>>>> But enlight me please :) isnt GDPR just about cleaning from
>>>> persistent
>>>>>> storage ?
>>>>>> In what sense does NiFi relates to GDPR compliance ?
>>>>>> 
>>>>>>   - in terms of data FF contents - they too transient (gone in
>>>> 12hours /
>>>>>>   default).
>>>>>>   - I guess discussion is on the fact FF attributes are kept on
>>>> the data
>>>>>>   provenance repo ? (gone in 24h / default)
>>>>>> 
>>>>>> I wonder wheres the culprit here ? Is it in the situation hwere one
>>>> wants
>>>>>> to keep a long trace of data provenance like 6 months, but because
>>>>>> attributes are stored on provenance events, then they must be
>>>> deleted ?
>>>>>> I guess it can only be a problem of deleting attributes from
>>>> provenance
>>>>>> repo and no FF contents right as they gone fast enough ?
>>>>>> 
>>>>>> Best Regards,
>>>>>> *Emanuel Oliveira*
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Jan 30, 2020 at 4:42 PM Mike Thomsen
>>>> <[email protected]>
>>>>> wrote:
>>>>>>>> It was created on this side of the Atlantic because when people
>>>> do care
>>>>>>> about such things - they REALLY care.
>>>>>>> 
>>>>>>> Agreed. I was just commenting on our particular experiences with
>>>>> customers
>>>>>>> in the federal space. There are unfortunately many who still don't
>>>> get
>>>>> all
>>>>>>> of the accountability traceability advantages provenance and
>>>> lineage
>>>>>>> tracking provides.
>>>>>>> 
>>>>>>> On Thu, Jan 30, 2020 at 10:32 AM Joe Witt <[email protected]>
>>>> wrote:
>>>>>>>> Mike,
>>>>>>>> 
>>>>>>>> It was created on this side of the Atlantic because when people
>>>> do care
>>>>>>>> about such things - they REALLY care.
>>>>>>>> 
>>>>>>>> I anticipate more and more people will care and I hope that day
>>>> comes
>>>>>>>> soon.  I'm proud of NiFi's ability to be a leader here because if
>>>> your
>>>>>>> flow
>>>>>>>> management solution between sensors and processing and storage
>>>> systems
>>>>>>>> tells you where things came from and went to it is a heck of a
>>>> good
>>>>>>> start.
>>>>>>>> What exists in our provenance data is information about the data
>>>> but
>>>>> this
>>>>>>>> can be 'any attribute' put on a flow file throughout its life in
>>>> the
>>>>>>> flow.
>>>>>>>> We simply cannot guarantee this wont be 'content'.  The notion of
>>>> what
>>>>> is
>>>>>>>> metadata vs content gets blurry fast.
>>>>>>>> 
>>>>>>>> Uwe,
>>>>>>>> 
>>>>>>>> The data provenance capabilities within NiFi do no support the
>>>> ability
>>>>> to
>>>>>>>> 'delete records' based on specified parameters.  The only
>>>> mechanism is
>>>>>>>> space or time based age off.  For now, whatever the obligation is
>>>> to
>>>>>>>> respond to a right to be forgotten request should be what the
>>>>> provenance
>>>>>>>> within NiFi is configured to hold.  If for instance you have 24
>>>> hours
>>>>>>> then
>>>>>>>> provenance in NiFi should hold no more than 24 hours.
>>>>>>>> 
>>>>>>>> I doubt this is something we'll be able to spend time on sooner
>>>> but I
>>>>>>> agree
>>>>>>>> the idea of being able to purge out records is a good one based
>>>> on more
>>>>>>>> precise parameters.
>>>>>>>> 
>>>>>>>> The intent is not that the built-in nifi provenance store is for
>>>> long
>>>>>>> term
>>>>>>>> but rather the records are there long enough to support flow
>>>> management
>>>>>>> use
>>>>>>>> cases but are always being exported to a long term store such as
>>>> Atlas
>>>>> or
>>>>>>>> even just stored in HDFS or other locations for additional use.
>>>> One
>>>>>>>> day...a sweet graph database...
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Joe
>>>>>>>> 
>>>>>>>> On Thu, Jan 30, 2020 at 10:29 AM Emanuel Oliveira
>>>> <[email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Some recap on NiFi concepts:
>>>>>>>>> 
>>>>>>>>>   - Content Repository stores FF contents.
>>>>>>>>>   - Data Provenance events -used to check lineage of history of
>>>> FFs-
>>>>>>>> only
>>>>>>>>>   stores pointers to FFs (not contents).
>>>>>>>>>   - so one can have data deleted and still access lineage/data
>>>>>>>> provenance
>>>>>>>>>   history.
>>>>>>>>> 
>>>>>>>>> Heres a lof of in-depth on the subject, but above 3 points are
>>>> the
>>>>>>>>> summary of all:
>>>>>>>>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> *DATA - persistent data only exists in 2 scenarios:*
>>>>>>>>> 
>>>>>>>>>   - while your flow file running.
>>>>>>>>>   - archived on content repository for 12h (to allow access
>>>> contents
>>>>>>>> when
>>>>>>>>>   using inspect data provenance/lineage).
>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418
>>>>>>>>> *PROVENANCE EVENTS (LINEAGE) OF DATA:*
>>>>>>>>> 
>>>>>>>>>   - contains only provenance attributes and FF uuid etcbut NO
>>>>>>> CONTENTS,
>>>>>>>>>   available for 24h unless increasing/changed on config files.
>>>>>>>>>   -
>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>>>>>>>>> 
>>>>>>>>> So as you see both context by default expire daily. fast enough
>>>> that
>>>>>>> dont
>>>>>>>>> think GDPR is any problem or any action needed.
>>>>>>>>> Now one can always boosts retention of just data provenance
>>>> events for
>>>>>>>>> months, 1 year or whatever suits. But data is long gone anyway.
>>>>>>>>> 
>>>>>>>>> Best Regards,
>>>>>>>>> *Emanuel Oliveira*
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Jan 30, 2020 at 2:26 PM [email protected] <
>>>>> [email protected]
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>>> GDPR doesnt need milisecond realtime deletion right ?)
>>>>>>>>>> right.
>>>>>>>>>> 
>>>>>>>>>>> since inbound FFs have
>>>>>>>>>>>   normally hundreds, thousands of records that will need to
>>>> split,
>>>>>>>>>> aggregate,
>>>>>>>>>>>   in complex flow file, implementing a clean
>>>>>>>>>> It depends on your application. Not everyone uses NiFi for IoT
>>>> and
>>>>>>>>>> therefore a single record may be included.
>>>>>>>>>> 
>>>>>>>>>>> In my opinion your answer to business/management gate keepers
>>>> is
>>>>>>> that
>>>>>>>>>> data
>>>>>>>>>>> will be stored on data provenance for 24h (default) which can
>>>> be
>>>>>>>>>>> configured, and that
>>>>>>>>>> This is not necessarily the point of the Data Lineage, that the
>>>>>>>>>> information is deleted after 24 hours (or whatever is
>>>> configured).
>>>>>>>>>> If Data Lineage is needed (revision, legal requirements etc.),
>>>> then
>>>>>>>>>> deleting the data after a defined time is not an option.
>>>>>>>>>> 
>>>>>>>>>> This is the reason why Atlas supports it.
>>>>>>>>>> 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Uwe
>>>>>>>>>> 
>>>>>>>>>> Am 30.01.2020 um 15:06 schrieb Emanuel Oliveira:
>>>>>>>>>>> Hi, dont think makes sense an api for atomic records:
>>>>>>>>>>> 
>>>>>>>>>>>   1. one configure retention od data provenance (default 24h
>>>> is
>>>>>>>> "good
>>>>>>>>>>>   enough" GDPR doesnt need milisecond realtime deletion right
>>>> ?)
>>>> 
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>>>>>>>>>>>   2. even if there would be one api to delete FF's with an
>>>>>>>> attribute =
>>>>>>>>>>>   <some id>, that would normally be useless as well, since
>>>> inbound
>>>>>>>> FFs
>>>>>>>>>> have
>>>>>>>>>>>   normally hundreds, thousands of records that will need to
>>>> split,
>>>>>>>>>> aggregate,
>>>>>>>>>>>   in complex flow file, implementing a clean up an nano
>>>> atomic
>>>>>>> level
>>>>>>>>>> would be
>>>>>>>>>>>   to hard and extra effort not needed, since your target
>>>> single
>>>>>>>> record
>>>>>>>>>> would
>>>>>>>>>>>   surely be part of multiple FF UUIDs, some only holding your
>>>>>>>> record,
>>>>>>>>>> but mot
>>>>>>>>>>>   surefly will have 100s, 100s of other records including
>>>> your
>>>>>>>> record
>>>>>>>>>>>   somewhere on the middle.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> In my opinion your answer to business/management gate keepers
>>>> is
>>>>>>> that
>>>>>>>>>> data
>>>>>>>>>>> will be stored on data provenance for 24h (default) which can
>>>> be
>>>>>>>>>>> configured, and that
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> *Emanuel Oliveira*
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Jan 30, 2020 at 1:54 PM [email protected] <
>>>>>>>> [email protected]
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Dear NiFi developer team,
>>>>>>>>>>>> 
>>>>>>>>>>>> NiFi's Data Provenance and Data Lineage is perfectly adequate
>>>> in
>>>>>>> the
>>>>>>>>>>>> environment of NiFi, so there is often no need to use Atlas.
>>>>>>>>>>>> 
>>>>>>>>>>>> When using NiFi with customer data a problem arises.
>>>>>>>>>>>> The problem is the GDPR requirement that a user has the right
>>>> to
>>>>>>> be
>>>>>>>>>>>> forgotten. Unfortunately, I can't find any API call or
>>>> information
>>>>>>>> on
>>>>>>>>>>>> how to delete individual user data from the NiFi Provenance
>>>>>>>> Repository
>>>>>>>>>>>> based on a user-defined attribute and its defined
>>>> characteristics.
>>>>>>>>>>>> A delete request like "delete all data and dependencies where
>>>> the
>>>>>>>>>>>> attribute XYZ has the value 123" is currently not possible to
>>>> my
>>>>>>>>>> knowledge.
>>>>>>>>>>>> My questions are:
>>>>>>>>>>>> Is this actually possible and how? And if not, is it planned?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Uwe
>>>>>>>>>>>> 
>>>>> 
>> 
>> 

Reply via email to