[ 
https://issues.apache.org/jira/browse/NIFI-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Kawamura updated NIFI-4993:
--------------------------------
    Description: 
ReportLineageToAtlas 'complete path' strategy uses NiFi provenance lineage 
query with an anonymous user. If NiFi is secured and the user who made the 
lineage query request does not have required privilege, NiFi returns provenance 
event type as UNKNOWN, and also does not traverse lineage fully.

Specifically, the authorization is implemented here:
 
[https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L2641]
{code:java|title=PersistentProvenanceRepository$ComputeLineageRunnable.run}
final StandardLineageResult result = submission.getResult();
result.update(replaceUnauthorizedWithPlaceholders(matchingRecords, user), 
matchingRecords.size());
{code}
This affects to ReportLineageToAtlas 'complete path' strategy as it will not be 
able to traverse parent provenance events to analyze full lineage path for a 
FlowFile. As a result, the reporting task can not report lineage with some 
structures of flow.

 For example, with the following NiFi flow, the FlowFile that was RECEIVEd by 
GetFile went through Kafka route (the right branch). Also, the FlowFile was 
CLONEd to go Hive and HDFS routes.

!flow-screenshot.png|width=100%!

Then the original FlowFile that went through Kafka route would have NiFi 
lineage like this. This lineage can be retrieved by single lineage query and 
works even with an anonymous user. These routes can be reported to Atlas:
 !kafka-route.png|width=180!

However, the CLONEd routes would have following lineage. This graph was queried 
from NiFi UI by a NiFi user who has sufficient privilege. But with an anonymous 
user, the link from SEND (23) to the FlowFile then CLONE (18) is not returned. 
Because event types are masked as UNKNOWN and NiFi framework does not traverse 
the linkage. Thus, these cloned routes are not reported to Atlas.
 !hdfs-route.png!

-ReportLineageToAtlas needs to have a property so that user can specify a NiFi 
user id to impersonate, so that required policies can be administrated. 1st PR 
[2567|https://github.com/apache/nifi/pull/2567]-
-Instead of letting user to specify a NiFi user id, the updated 2nd PR 
([2577|https://github.com/apache/nifi/pull/2577]) fixes lineage computation 
with unauthorized user.-
The 3rd PR ([2589|https://github.com/apache/nifi/pull/2589]) attempts fixing 
this issue by modifying ProvenanceRepository implementations to accept null 
user so that lineage query can be called by NiFi internal components.

This issue was originally reported by [~nayakmahesh616].

A simplified NiFi flow template to test the proposed fix is attached.

  was:
ReportLineageToAtlas 'complete path' strategy uses NiFi provenance lineage 
query with an anonymous user. If NiFi is secured and the user who made the 
lineage query request does not have required privilege, NiFi returns provenance 
event type as UNKNOWN, and also does not traverse lineage fully.

Specifically, the authorization is implemented here:
 
[https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L2641]
{code:java|title=PersistentProvenanceRepository$ComputeLineageRunnable.run}
final StandardLineageResult result = submission.getResult();
result.update(replaceUnauthorizedWithPlaceholders(matchingRecords, user), 
matchingRecords.size());
{code}
This affects to ReportLineageToAtlas 'complete path' strategy as it will not be 
able to traverse parent provenance events to analyze full lineage path for a 
FlowFile. As a result, the reporting task can not report lineage with some 
structures of flow.

 For example, with the following NiFi flow, the FlowFile that was RECEIVEd by 
GetFile went through Kafka route (the right branch). Also, the FlowFile was 
CLONEd to go Hive and HDFS routes.

!flow-screenshot.png|width=100%!

Then the original FlowFile that went through Kafka route would have NiFi 
lineage like this. This lineage can be retrieved by single lineage query and 
works even with an anonymous user. These routes can be reported to Atlas:
 !kafka-route.png|width=180!

However, the CLONEd routes would have following lineage. This graph was queried 
from NiFi UI by a NiFi user who has sufficient privilege. But with an anonymous 
user, the link from SEND (23) to the FlowFile then CLONE (18) is not returned. 
Because event types are masked as UNKNOWN and NiFi framework does not traverse 
the linkage. Thus, these cloned routes are not reported to Atlas.
 !hdfs-route.png!

-ReportLineageToAtlas needs to have a property so that user can specify a NiFi 
user id to impersonate, so that required policies can be administrated.-
Instead of letting user to specify a NiFi user id, the updated PR 
([2577|https://github.com/apache/nifi/pull/2577]) fixes lineage computation 
with unauthorized user.
Without PR2577, computed lineage looks as below:
!unauthorized-query.png|width=200!

With PR2577:
!unauthorized-query-with-fix.png|width=100!

This issue was originally reported by [~nayakmahesh616].

A simplified NiFi flow template to test the proposed fix is attached.


> ReportLineageToAtlas complete path strategy does not report some lineages 
> with secured NiFi
> -------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4993
>                 URL: https://issues.apache.org/jira/browse/NIFI-4993
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.5.0
>            Reporter: Koji Kawamura
>            Assignee: Koji Kawamura
>            Priority: Major
>         Attachments: NIFI-4993.xml, flow-screenshot.png, hdfs-route.png, 
> kafka-route.png, unauthorized-query-with-fix.png, unauthorized-query.png
>
>
> ReportLineageToAtlas 'complete path' strategy uses NiFi provenance lineage 
> query with an anonymous user. If NiFi is secured and the user who made the 
> lineage query request does not have required privilege, NiFi returns 
> provenance event type as UNKNOWN, and also does not traverse lineage fully.
> Specifically, the authorization is implemented here:
>  
> [https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L2641]
> {code:java|title=PersistentProvenanceRepository$ComputeLineageRunnable.run}
> final StandardLineageResult result = submission.getResult();
> result.update(replaceUnauthorizedWithPlaceholders(matchingRecords, user), 
> matchingRecords.size());
> {code}
> This affects to ReportLineageToAtlas 'complete path' strategy as it will not 
> be able to traverse parent provenance events to analyze full lineage path for 
> a FlowFile. As a result, the reporting task can not report lineage with some 
> structures of flow.
>  For example, with the following NiFi flow, the FlowFile that was RECEIVEd by 
> GetFile went through Kafka route (the right branch). Also, the FlowFile was 
> CLONEd to go Hive and HDFS routes.
> !flow-screenshot.png|width=100%!
> Then the original FlowFile that went through Kafka route would have NiFi 
> lineage like this. This lineage can be retrieved by single lineage query and 
> works even with an anonymous user. These routes can be reported to Atlas:
>  !kafka-route.png|width=180!
> However, the CLONEd routes would have following lineage. This graph was 
> queried from NiFi UI by a NiFi user who has sufficient privilege. But with an 
> anonymous user, the link from SEND (23) to the FlowFile then CLONE (18) is 
> not returned. Because event types are masked as UNKNOWN and NiFi framework 
> does not traverse the linkage. Thus, these cloned routes are not reported to 
> Atlas.
>  !hdfs-route.png!
> -ReportLineageToAtlas needs to have a property so that user can specify a 
> NiFi user id to impersonate, so that required policies can be administrated. 
> 1st PR [2567|https://github.com/apache/nifi/pull/2567]-
> -Instead of letting user to specify a NiFi user id, the updated 2nd PR 
> ([2577|https://github.com/apache/nifi/pull/2577]) fixes lineage computation 
> with unauthorized user.-
> The 3rd PR ([2589|https://github.com/apache/nifi/pull/2589]) attempts fixing 
> this issue by modifying ProvenanceRepository implementations to accept null 
> user so that lineage query can be called by NiFi internal components.
> This issue was originally reported by [~nayakmahesh616].
> A simplified NiFi flow template to test the proposed fix is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to