Bob Paulin created NIFI-13779:
---------------------------------
Summary: [NiFi 2.x Python] Missing Some Data Provenance Events
from Python Processors
Key: NIFI-13779
URL: https://issues.apache.org/jira/browse/NIFI-13779
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 2.0.0-M4
Environment: Mac ARM64
Reporter: Bob Paulin
If I run the following Python Processor (FlowFileTransform type) that has 2
relationship defined coco and annotations from the codebase
[https://github.com/bobpaulin/nifi-ai-talk/tree/main/table-detection-processor]
I get Data Providence events from both relationships when I terminate the
relationships. I do NOT get Data Providence events from either relationship
when that relationship is passed on to another processor
See flow
[https://github.com/bobpaulin/nifi-ai-talk/blob/main/flow_defs/TestTable.json]
I believe the issue is due to how we're cloning the flow file and using the
Clone as clone to proceed as the "transformed" flow file. There is logic to
drop providence events on the cloned flow file
SEE
[https://github.com/apache/nifi/blob/563d7ea6140c9cd847ddae56f3d3a1690abd6972/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L927]
This is because the data will be changed but it will be seen as a "New" flow
file since orginal will be null
I suggest we send off the cloned flow file on the original relationship and use
the incoming flow file in the
[https://github.com/apache/nifi/blob/445d34f91e7581c4f4f92540bc6b055118e5966e/nifi-extension-bundles/nifi-py4j-extension-bundle/nifi-py4j-bridge/src/main/java/org/apache/nifi/python/processor/FlowFileTransformProxy.java]
to be used as transformed. PR will be incoming.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)