Sorry :-) Mit freundlichen Grüßen / best regards Kay-Uwe Moosheimer
> Am 30.01.2020 um 23:08 schrieb Joe Witt <[email protected]>: > > Our data provenance is. Just not our repository :) > >> On Thu, Jan 30, 2020 at 5:00 PM [email protected] <[email protected]> >> wrote: >> >> Lars >> >> You're absolutely right about what you say. >> If the data in the NiFi repositories is only stored temporarily for a >> few hours, then documentation is quite sufficient. >> >> The original question was how to delete data from the data lineage. >> I assumed to use the NiFi repository as a full Data Lineage System. >> If NiFi is your central application, then you could avoid having to >> install Atlas as well. And with Atlas, you would have to install Ranger, >> Cassandra or even Hadoop and HBase. >> >> Joe has already made it clear to me here that Data Provenance/Data >> Lineage of NiFi is not designed for this yet. >> Maybe in the future... >> >> Best >> Uwe >> >>> Am 30.01.2020 um 22:08 schrieb Lars Winderling: >>> Dear Uwe and fellow devs, >>> >>> sorry if I completely miss the point here, but I'll try. Also working >> with NiFi under GDPR-regulations in online ad business. From my point it >> would be sufficient to ensure that no new data will get stored, if a user >> requests deletion, and delete all personal data from all respective >> systems. The NiFi repos will expire their data, which can be argued to >> equal a delayed deletion. Remember that GDPR is quite strict, but if you >> have a proper case for this kind of process e.g. due to technical >> limitations, it needs to be documented, and then it will likely be ok. We >> do it similarly, and our legal counsel approved this. My response, however, >> is not legally binding. The regulation says something like you should take >> appropriate measures. If such a tool like NiFi just doesn't let you delete >> temporarily stored data instantly, this may seem acceptable. >>> >>> Best, >>> Lars >>> >>> Am 30. Januar 2020 21:36:31 MEZ schrieb Mike Thomsen < >> [email protected]>: >>>> I suppose the elephant in the room here is what sort of personal data >>>> is >>>> being stored in your provenance records? Can't you just refactor your >>>> flows >>>> to ensure that the provenance data doesn't meaningful contain anything >>>> traceable to a person? >>>> >>>> On Thu, Jan 30, 2020 at 12:41 PM [email protected] >>>> <[email protected]> >>>> wrote: >>>> >>>>> Emanuel >>>>> >>>>> That was not meant disrespectfully by me. And if that's how you felt, >>>>> then I apologize. >>>>> >>>>>> In what sense does NiFi relates to GDPR compliance ? >>>>> All person-related data that flows, is read, sent or stored etc. in >>>> a >>>>> company is GDPR relevant. >>>>> >>>>>> - in terms of data FF contents - they too transient (gone in 12hours >>>> / >>>>> default). >>>>> It makes no difference how long the data is stored. And it makes no >>>>> difference if data is stored on disk or just in memory. >>>>> >>>>> The data can potentially be read, processed by others or sent to >>>> other >>>>> systems and so on. Or the data can be used during this time to >>>> establish >>>>> relationships to other data (pseudo anonymized data etc.). >>>>> >>>>>> I guess discussion is on the fact FF attributes are kept on the >>>> data >>>>> provenance repo ? (gone in 24h / default) >>>>> I'm afraid not. It's generally a matter of NiFi storing data - as >>>>> already mentioned, it doesn't make any difference whether it's on the >>>>> hard disk or just in memory. >>>>> >>>>>> I wonder where the culprit here ? >>>>> There's no culprit here. It's generally a problem with GDPR when >>>>> processing person-related data. >>>>> It's a problem of person-related data. >>>>> It is a problem of person-related data, which would fill a book, what >>>> is >>>>> person-related, because machine data can also be person-related, for >>>>> example if I can relate a person directly to the machine and >>>> place/time. >>>>> This would allow me to track a person/employee and this is not >>>> allowed >>>>> (unless a law allows me to do so). >>>>> >>>>> All this goes much further and would be far too much to mention now. >>>>> In principle, we have a GDPR issue and must act in accordance with >>>> the law. >>>>> We do not agree with all the regulation either. But all regulations I >>>>> know so far have at least one justification. Even if we as enterprise >>>>> architects, developers, administrators etc. have our problems with >>>> them. >>>>> Regards >>>>> Uwe >>>>> >>>>> Am 30.01.2020 um 17:51 schrieb Emanuel Oliveira: >>>>>> But enlight me please :) isnt GDPR just about cleaning from >>>> persistent >>>>>> storage ? >>>>>> In what sense does NiFi relates to GDPR compliance ? >>>>>> >>>>>> - in terms of data FF contents - they too transient (gone in >>>> 12hours / >>>>>> default). >>>>>> - I guess discussion is on the fact FF attributes are kept on >>>> the data >>>>>> provenance repo ? (gone in 24h / default) >>>>>> >>>>>> I wonder wheres the culprit here ? Is it in the situation hwere one >>>> wants >>>>>> to keep a long trace of data provenance like 6 months, but because >>>>>> attributes are stored on provenance events, then they must be >>>> deleted ? >>>>>> I guess it can only be a problem of deleting attributes from >>>> provenance >>>>>> repo and no FF contents right as they gone fast enough ? >>>>>> >>>>>> Best Regards, >>>>>> *Emanuel Oliveira* >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jan 30, 2020 at 4:42 PM Mike Thomsen >>>> <[email protected]> >>>>> wrote: >>>>>>>> It was created on this side of the Atlantic because when people >>>> do care >>>>>>> about such things - they REALLY care. >>>>>>> >>>>>>> Agreed. I was just commenting on our particular experiences with >>>>> customers >>>>>>> in the federal space. There are unfortunately many who still don't >>>> get >>>>> all >>>>>>> of the accountability traceability advantages provenance and >>>> lineage >>>>>>> tracking provides. >>>>>>> >>>>>>> On Thu, Jan 30, 2020 at 10:32 AM Joe Witt <[email protected]> >>>> wrote: >>>>>>>> Mike, >>>>>>>> >>>>>>>> It was created on this side of the Atlantic because when people >>>> do care >>>>>>>> about such things - they REALLY care. >>>>>>>> >>>>>>>> I anticipate more and more people will care and I hope that day >>>> comes >>>>>>>> soon. I'm proud of NiFi's ability to be a leader here because if >>>> your >>>>>>> flow >>>>>>>> management solution between sensors and processing and storage >>>> systems >>>>>>>> tells you where things came from and went to it is a heck of a >>>> good >>>>>>> start. >>>>>>>> What exists in our provenance data is information about the data >>>> but >>>>> this >>>>>>>> can be 'any attribute' put on a flow file throughout its life in >>>> the >>>>>>> flow. >>>>>>>> We simply cannot guarantee this wont be 'content'. The notion of >>>> what >>>>> is >>>>>>>> metadata vs content gets blurry fast. >>>>>>>> >>>>>>>> Uwe, >>>>>>>> >>>>>>>> The data provenance capabilities within NiFi do no support the >>>> ability >>>>> to >>>>>>>> 'delete records' based on specified parameters. The only >>>> mechanism is >>>>>>>> space or time based age off. For now, whatever the obligation is >>>> to >>>>>>>> respond to a right to be forgotten request should be what the >>>>> provenance >>>>>>>> within NiFi is configured to hold. If for instance you have 24 >>>> hours >>>>>>> then >>>>>>>> provenance in NiFi should hold no more than 24 hours. >>>>>>>> >>>>>>>> I doubt this is something we'll be able to spend time on sooner >>>> but I >>>>>>> agree >>>>>>>> the idea of being able to purge out records is a good one based >>>> on more >>>>>>>> precise parameters. >>>>>>>> >>>>>>>> The intent is not that the built-in nifi provenance store is for >>>> long >>>>>>> term >>>>>>>> but rather the records are there long enough to support flow >>>> management >>>>>>> use >>>>>>>> cases but are always being exported to a long term store such as >>>> Atlas >>>>> or >>>>>>>> even just stored in HDFS or other locations for additional use. >>>> One >>>>>>>> day...a sweet graph database... >>>>>>>> >>>>>>>> Thanks >>>>>>>> Joe >>>>>>>> >>>>>>>> On Thu, Jan 30, 2020 at 10:29 AM Emanuel Oliveira >>>> <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Some recap on NiFi concepts: >>>>>>>>> >>>>>>>>> - Content Repository stores FF contents. >>>>>>>>> - Data Provenance events -used to check lineage of history of >>>> FFs- >>>>>>>> only >>>>>>>>> stores pointers to FFs (not contents). >>>>>>>>> - so one can have data deleted and still access lineage/data >>>>>>>> provenance >>>>>>>>> history. >>>>>>>>> >>>>>>>>> Heres a lof of in-depth on the subject, but above 3 points are >>>> the >>>>>>>>> summary of all: >>>>>>>>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html >>>>>>>>> >>>>>>>>> >>>>>>>>> *DATA - persistent data only exists in 2 scenarios:* >>>>>>>>> >>>>>>>>> - while your flow file running. >>>>>>>>> - archived on content repository for 12h (to allow access >>>> contents >>>>>>>> when >>>>>>>>> using inspect data provenance/lineage). >>>>>>>>> >>>>>>>>> >>>> >> https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418 >>>>>>>>> *PROVENANCE EVENTS (LINEAGE) OF DATA:* >>>>>>>>> >>>>>>>>> - contains only provenance attributes and FF uuid etcbut NO >>>>>>> CONTENTS, >>>>>>>>> available for 24h unless increasing/changed on config files. >>>>>>>>> - >>>>>>>>> >>>>>>>>> >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties >>>>>>>>> >>>>>>>>> So as you see both context by default expire daily. fast enough >>>> that >>>>>>> dont >>>>>>>>> think GDPR is any problem or any action needed. >>>>>>>>> Now one can always boosts retention of just data provenance >>>> events for >>>>>>>>> months, 1 year or whatever suits. But data is long gone anyway. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> *Emanuel Oliveira* >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jan 30, 2020 at 2:26 PM [email protected] < >>>>> [email protected] >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>>> GDPR doesnt need milisecond realtime deletion right ?) >>>>>>>>>> right. >>>>>>>>>> >>>>>>>>>>> since inbound FFs have >>>>>>>>>>> normally hundreds, thousands of records that will need to >>>> split, >>>>>>>>>> aggregate, >>>>>>>>>>> in complex flow file, implementing a clean >>>>>>>>>> It depends on your application. Not everyone uses NiFi for IoT >>>> and >>>>>>>>>> therefore a single record may be included. >>>>>>>>>> >>>>>>>>>>> In my opinion your answer to business/management gate keepers >>>> is >>>>>>> that >>>>>>>>>> data >>>>>>>>>>> will be stored on data provenance for 24h (default) which can >>>> be >>>>>>>>>>> configured, and that >>>>>>>>>> This is not necessarily the point of the Data Lineage, that the >>>>>>>>>> information is deleted after 24 hours (or whatever is >>>> configured). >>>>>>>>>> If Data Lineage is needed (revision, legal requirements etc.), >>>> then >>>>>>>>>> deleting the data after a defined time is not an option. >>>>>>>>>> >>>>>>>>>> This is the reason why Atlas supports it. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Uwe >>>>>>>>>> >>>>>>>>>> Am 30.01.2020 um 15:06 schrieb Emanuel Oliveira: >>>>>>>>>>> Hi, dont think makes sense an api for atomic records: >>>>>>>>>>> >>>>>>>>>>> 1. one configure retention od data provenance (default 24h >>>> is >>>>>>>> "good >>>>>>>>>>> enough" GDPR doesnt need milisecond realtime deletion right >>>> ?) >>>> >> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties >>>>>>>>>>> 2. even if there would be one api to delete FF's with an >>>>>>>> attribute = >>>>>>>>>>> <some id>, that would normally be useless as well, since >>>> inbound >>>>>>>> FFs >>>>>>>>>> have >>>>>>>>>>> normally hundreds, thousands of records that will need to >>>> split, >>>>>>>>>> aggregate, >>>>>>>>>>> in complex flow file, implementing a clean up an nano >>>> atomic >>>>>>> level >>>>>>>>>> would be >>>>>>>>>>> to hard and extra effort not needed, since your target >>>> single >>>>>>>> record >>>>>>>>>> would >>>>>>>>>>> surely be part of multiple FF UUIDs, some only holding your >>>>>>>> record, >>>>>>>>>> but mot >>>>>>>>>>> surefly will have 100s, 100s of other records including >>>> your >>>>>>>> record >>>>>>>>>>> somewhere on the middle. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> In my opinion your answer to business/management gate keepers >>>> is >>>>>>> that >>>>>>>>>> data >>>>>>>>>>> will be stored on data provenance for 24h (default) which can >>>> be >>>>>>>>>>> configured, and that >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> *Emanuel Oliveira* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jan 30, 2020 at 1:54 PM [email protected] < >>>>>>>> [email protected] >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Dear NiFi developer team, >>>>>>>>>>>> >>>>>>>>>>>> NiFi's Data Provenance and Data Lineage is perfectly adequate >>>> in >>>>>>> the >>>>>>>>>>>> environment of NiFi, so there is often no need to use Atlas. >>>>>>>>>>>> >>>>>>>>>>>> When using NiFi with customer data a problem arises. >>>>>>>>>>>> The problem is the GDPR requirement that a user has the right >>>> to >>>>>>> be >>>>>>>>>>>> forgotten. Unfortunately, I can't find any API call or >>>> information >>>>>>>> on >>>>>>>>>>>> how to delete individual user data from the NiFi Provenance >>>>>>>> Repository >>>>>>>>>>>> based on a user-defined attribute and its defined >>>> characteristics. >>>>>>>>>>>> A delete request like "delete all data and dependencies where >>>> the >>>>>>>>>>>> attribute XYZ has the value 123" is currently not possible to >>>> my >>>>>>>>>> knowledge. >>>>>>>>>>>> My questions are: >>>>>>>>>>>> Is this actually possible and how? And if not, is it planned? >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Uwe >>>>>>>>>>>> >>>>> >> >>
