Hi Mark,

I get what you're saying that option 2 changes the semantics so it makes sense to take that one off the table.  I was able to look at option 1 and take it a bit further that I believe may bring the events closer to my stated goal of ensuring both attributes and content are kept as they were at the time of clone/send etc.  I've made the following changes in a branch [1] that do the following:

1) Adds clone to the events that should be have attribute updates suppressed (Option 1) 2) I've broadened the updateAttribute to also control updates to the original and current content with respect to the ProvidenceEventRecord.  This means when attributes are not updated neither are the original or current content. 3) I've added an enrich event to the clone event. It appears that the send and upload events already have an enrich in the ProvidenceReporter so they should have the content recorded at the time the event is registered.

I believe this should keep the same semantics while freezing the content at the time of the event for send, upload and clone.  Let me know if this approach makes sense.  I was able to test this locally and it does keep the cloned content as it was at the time of cloning even when the cloned content is modified as it is in the Python Processors. It also continues to display correctly for sent and uploaded content (Tried with PutFile).

Sincerely,
Bob Paulin

[1] https://github.com/bobpaulin/nifi/tree/NIFI-13808

On 10/29/2024 10:19 AM, Mark Payne wrote:
Hey Bob,

I think that Option 1 does make sense. We should show the attributes, etc. as 
they are when the FlowFile is cloned. I do not think the semantics would be 
accurate for Option #2. If we were to use the child/clone to populate the clone 
event, that would imply that the child was cloned. It is the parent that is 
being cloned, so the clone event should reflect the parent.

Thanks
-Mark

On Oct 29, 2024, at 8:40 AM, Bob Paulin <b...@bobpaulin.com> wrote:

Hi,

In working with the Python Processors I've got some questions about how NiFi 
handles Clone events.  Currently when a clone event is displayed in a 
provenance event in a processor the Attribute and Content information attached 
to the event represents the originally cloned FlowFile. This includes data 
modifications that are applied after the FlowFile is cloned.  My concern is 
this doesn't give the end user a clear idea of what was actually cloned from 
the original FlowFile.  A couple of different ways I see this could be improved.

1) Treat the CLONE events like SEND events[1].  This would prevent the FlowFile 
attributes from being updated prior to being written to the Provenance repo.

OR

2) Use the FlowFile's clone (the child) to populate the clone event[2].  The 
cloned file has the proper representation of the data at the time of cloning 
and will not contain any updates made to the original FlowFile when being 
displayed.

The first suggestion is more subtle.  It freeze the attribute data but the 
Output Content Claim displayed would still reflect changes to the parent file 
following the clone.

The second suggestion is a more significant/breaking change to how Provenance 
stores clone events.  However the more fundamental change means that the 
Provenance Event will show the correct attributes and content from the time of 
cloning.

My preference is the second suggestion but I also understand if there may be 
reasons for the design displaying the parent FlowFile's information.  My 
interest is improving the experience documented in [3].  Open to suggestions if 
there are other paths to doing so.


Sincerely,

Bob Paulin


[1] 
https://github.com/apache/nifi/blob/aacbd514ce4af7e41f54fc2418394c563395c9bd/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/controller/repository/StandardProcessSession.java#L1010

[2] 
https://github.com/apache/nifi/blob/aacbd514ce4af7e41f54fc2418394c563395c9bd/nifi-framework-bundle/nifi-framework/nifi-framework-components/src/main/java/org/apache/nifi/controller/repository/StandardProvenanceReporter.java#L457

[3] https://issues.apache.org/jira/browse/NIFI-13808


Reply via email to