[ 
https://issues.apache.org/jira/browse/NIFI-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Kawamura updated NIFI-4752:
--------------------------------
    Description: 
The ['Provenance Events' 
documentation|https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#provenance_events]
 describes about REPLAY event as follows:
{quote}
Indicates a provenance event for replaying a FlowFile. The UUID of the event 
indicates the UUID of the original FlowFile that is being replayed. The event 
contains one Parent UUID that is also the UUID of the FlowFile that is being 
replayed and one Child UUID that is the UUID of the a newly created FlowFile 
that will be re-queued for processing
{quote}

The default PersistentProvenanceRepository behaves as written in the doc. But 
WriteAheadProvenanceRepository returns REPLAY events having Child UUID as its 
FlowFile UUID instead.

Here is the lines of code that set FlowFile UUID for the provenance events.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/schema/LookupTableEventRecord.java#L276-L280
{code}
        String uuid = updatedAttributes == null ? null : 
updatedAttributes.get(CoreAttributes.UUID.key());
        if (uuid == null) {
            uuid = previousAttributes == null ? null : 
previousAttributes.get(CoreAttributes.UUID.key());
        }
        builder.setFlowFileUUID(uuid);
{code}

WriteAheadProvenanceRepository does not seem to have 'FlowFile UUID'  value 
persisted, which is set by FlowController when replay events are registered. 
Instead, WriteAheadProvenanceRepository fill 'FlowFile UUID' from updated or 
previous 'UUID' attribute.
I don't know much background on why it is implemented this way, but it seems it 
drops 'FlowFile UUID' to reduce IO based on an assumption that it can be set by 
attributes.

  was:
The ['Provenance Events' 
documentation|https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#provenance_events]
 describes about REPLAY event as follows:
{quate}
Indicates a provenance event for replaying a FlowFile. The UUID of the event 
indicates the UUID of the original FlowFile that is being replayed. The event 
contains one Parent UUID that is also the UUID of the FlowFile that is being 
replayed and one Child UUID that is the UUID of the a newly created FlowFile 
that will be re-queued for processing
{quate}

The default PersistentProvenanceRepository behaves as written in the doc. But 
WriteAheadProvenanceRepository returns REPLAY events having Child UUID as its 
FlowFile UUID instead.

Here is the lines of code that set FlowFile UUID for the provenance events.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/schema/LookupTableEventRecord.java#L276-L280

WriteAheadProvenanceRepository does not seem to have 'FlowFile UUID'  value 
persisted, which is set by FlowController when replay events are registered. 
Instead, WriteAheadProvenanceRepository fill 'FlowFile UUID' from updated or 
previous 'UUID' attribute.
I don't know much background on why it is implemented this way, but it seems it 
drops 'FlowFile UUID' to reduce IO based on an assumption that it can be set by 
attributes.


> REPLAY events returned by WriteAheadProvenanceRepository have child FlowFile 
> UUID as event FlowFile UUID
> --------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4752
>                 URL: https://issues.apache.org/jira/browse/NIFI-4752
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.4.0
>            Reporter: Koji Kawamura
>
> The ['Provenance Events' 
> documentation|https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#provenance_events]
>  describes about REPLAY event as follows:
> {quote}
> Indicates a provenance event for replaying a FlowFile. The UUID of the event 
> indicates the UUID of the original FlowFile that is being replayed. The event 
> contains one Parent UUID that is also the UUID of the FlowFile that is being 
> replayed and one Child UUID that is the UUID of the a newly created FlowFile 
> that will be re-queued for processing
> {quote}
> The default PersistentProvenanceRepository behaves as written in the doc. But 
> WriteAheadProvenanceRepository returns REPLAY events having Child UUID as its 
> FlowFile UUID instead.
> Here is the lines of code that set FlowFile UUID for the provenance events.
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/schema/LookupTableEventRecord.java#L276-L280
> {code}
>         String uuid = updatedAttributes == null ? null : 
> updatedAttributes.get(CoreAttributes.UUID.key());
>         if (uuid == null) {
>             uuid = previousAttributes == null ? null : 
> previousAttributes.get(CoreAttributes.UUID.key());
>         }
>         builder.setFlowFileUUID(uuid);
> {code}
> WriteAheadProvenanceRepository does not seem to have 'FlowFile UUID'  value 
> persisted, which is set by FlowController when replay events are registered. 
> Instead, WriteAheadProvenanceRepository fill 'FlowFile UUID' from updated or 
> previous 'UUID' attribute.
> I don't know much background on why it is implemented this way, but it seems 
> it drops 'FlowFile UUID' to reduce IO based on an assumption that it can be 
> set by attributes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to