[
https://issues.apache.org/jira/browse/NIFI-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bob Paulin updated NIFI-13779:
------------------------------
Description:
If I run the following Python Processor (FlowFileTransform type) that has 2
relationship defined coco and annotations from the codebase
[https://github.com/bobpaulin/nifi-ai-talk/tree/main/table-detection-processor]
I get Data Providence events from both relationships when I terminate the
relationships. I do NOT get Data Providence events from either relationship
when that relationship is passed on to another processor
See flow
[https://github.com/bobpaulin/nifi-ai-talk/blob/main/flow_defs/TestTable.json]
I believe the issue is due to how we're cloning the flow file and using the
Clone as clone to proceed as the "transformed" flow file. There is logic to
drop providence events on the cloned flow file
SEE
[https://github.com/apache/nifi/blob/563d7ea6140c9cd847ddae56f3d3a1690abd6972/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L927]
This is because the data will be changed but it will be seen as a "New" flow
file since orginal will be null
I suggest we send off the cloned flow file on the original relationship and use
the incoming flow file in the
[https://github.com/apache/nifi/blob/445d34f91e7581c4f4f92540bc6b055118e5966e/nifi-extension-bundles/nifi-py4j-extension-bundle/nifi-py4j-bridge/src/main/java/org/apache/nifi/python/processor/FlowFileTransformProxy.java]
to be used as transformed.
With the PR I get the appropriate Provenance event to show [CONTENT_MODIFIED]
!NIFI-13779-DataProvenance.png!
was:
If I run the following Python Processor (FlowFileTransform type) that has 2
relationship defined coco and annotations from the codebase
[https://github.com/bobpaulin/nifi-ai-talk/tree/main/table-detection-processor]
I get Data Providence events from both relationships when I terminate the
relationships. I do NOT get Data Providence events from either relationship
when that relationship is passed on to another processor
See flow
[https://github.com/bobpaulin/nifi-ai-talk/blob/main/flow_defs/TestTable.json]
I believe the issue is due to how we're cloning the flow file and using the
Clone as clone to proceed as the "transformed" flow file. There is logic to
drop providence events on the cloned flow file
SEE
[https://github.com/apache/nifi/blob/563d7ea6140c9cd847ddae56f3d3a1690abd6972/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L927]
This is because the data will be changed but it will be seen as a "New" flow
file since orginal will be null
I suggest we send off the cloned flow file on the original relationship and use
the incoming flow file in the
[https://github.com/apache/nifi/blob/445d34f91e7581c4f4f92540bc6b055118e5966e/nifi-extension-bundles/nifi-py4j-extension-bundle/nifi-py4j-bridge/src/main/java/org/apache/nifi/python/processor/FlowFileTransformProxy.java]
to be used as transformed. PR will be incoming.
> [NiFi 2.x Python] Missing Some Data Provenance Events from Python Processors
> ----------------------------------------------------------------------------
>
> Key: NIFI-13779
> URL: https://issues.apache.org/jira/browse/NIFI-13779
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 2.0.0-M4
> Environment: Mac ARM64
> Reporter: Bob Paulin
> Priority: Major
> Attachments: NIFI-13779-DataProvenance.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> If I run the following Python Processor (FlowFileTransform type) that has 2
> relationship defined coco and annotations from the codebase
> [https://github.com/bobpaulin/nifi-ai-talk/tree/main/table-detection-processor]
> I get Data Providence events from both relationships when I terminate the
> relationships. I do NOT get Data Providence events from either relationship
> when that relationship is passed on to another processor
> See flow
> [https://github.com/bobpaulin/nifi-ai-talk/blob/main/flow_defs/TestTable.json]
> I believe the issue is due to how we're cloning the flow file and using the
> Clone as clone to proceed as the "transformed" flow file. There is logic to
> drop providence events on the cloned flow file
> SEE
> [https://github.com/apache/nifi/blob/563d7ea6140c9cd847ddae56f3d3a1690abd6972/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L927]
> This is because the data will be changed but it will be seen as a "New" flow
> file since orginal will be null
>
> I suggest we send off the cloned flow file on the original relationship and
> use the incoming flow file in the
> [https://github.com/apache/nifi/blob/445d34f91e7581c4f4f92540bc6b055118e5966e/nifi-extension-bundles/nifi-py4j-extension-bundle/nifi-py4j-bridge/src/main/java/org/apache/nifi/python/processor/FlowFileTransformProxy.java]
> to be used as transformed.
>
> With the PR I get the appropriate Provenance event to show [CONTENT_MODIFIED]
> !NIFI-13779-DataProvenance.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)