[
https://issues.apache.org/jira/browse/NIFI-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291346#comment-16291346
]
ASF GitHub Bot commented on NIFI-3709:
--------------------------------------
Github user markap14 commented on the issue:
https://github.com/apache/nifi/pull/2335
@ijokarumawak thanks for all of the work that you've put into this - it is
very much a non-trivial effort! For the most part, the code looks good. I
flagged a couple of minor things in the code, 1 or 2 thread-safety issues that
should be easy to address.
The only 'more significant' concern that I have is the use of the
dummied-up NiFiUser. As-is, this is an anonymous user and in a secured
environment will not retrieve the event details that are necessary. It also
means that we would be validating events against a user who doesn't even exist.
I think there are 2 ways to approach this: first, as I noted inline, we
could have a property to define which user the queries should run on behalf of.
So the user could add a "NiFi Atlas" user and use that. However, that's also a
bit concerning because it means that whoever has access to edit the reporting
task can run provenance queries on behalf of another user.
By far, my preference is to actually just update the ProvenanceRepository
implementations (There are 4 now, I think) so that if a null User is passed in,
we don't check permissions. This would mean that you can pass in null from
Reporting Task. We could also then update the interface to have an overloaded
method that does not require that a user be given.
Once that is addressed, I think it is a +1 from me from a code review
perspective.
Thanks
-Mark
> Export NiFi flow dataset lineage to Apache Atlas
> ------------------------------------------------
>
> Key: NIFI-3709
> URL: https://issues.apache.org/jira/browse/NIFI-3709
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Koji Kawamura
> Assignee: Koji Kawamura
>
> While Apache NiFi has provenance and event level lineage support within its
> data flow, Apache Atlas also does manage lineage between dataset and process
> those interacting with such data.
> It would be beneficial for users who use both NiFi and Atlas and if they can
> see end-to-end data lineage on Atlas lineage graph, as some type of dataset
> are processed by both NiFi and technologies around Atlas such as Storm,
> Falcon or Sqoop. For example, Kafka topics and Hive tables.
> In order to make this integration happen, I propose a NiFi reporting task
> that analyzes NiFi flow then creates DataSet and Process entities in Atlas.
> The challenge is how to design NiFi flow dataset level lineage within Atlas
> lineage graph.
> If we just add a single NiFi process and connect every DataSet from/to it, it
> would be too ambiguous since it won't be clear which part of a NiFi flow
> actually interact with certain dataset.
> But if we put every NiFi processor as independent process in Atlas, it would
> be too granular, too. Also, we already have detailed event level lineage in
> NiFi, we wouldn't need the same level in Atlas.
> If we can group certain processors in a NiFI flow as a process in Atlas, it
> would be a nice granularity.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)