[
https://issues.apache.org/jira/browse/NIFI-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893088#comment-17893088
]
ASF subversion and git services commented on NIFI-13929:
--------------------------------------------------------
Commit 6d6adfeaeb1d046a59f98d43beca55701995f316 in nifi's branch
refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=6d6adfeaeb ]
NIFI-13929 Fixed Provenance Event Handling for Stateless Engine (#9446)
Removed Provenance Repository from the stateless RepositoryContextFactory and
added it to the DataflowTriggerContext. This was necessary because the previous
design overlooked the possibility of many threads concurrently running the same
dataflow. They all shared the same StatelessProvenanceRepository, but the code
was designed as if only a single thread would be using the repository. As a
result, the events that were registered with the stateless prov repo were being
copied many times into NiFi's underlying provenance repository. This
refactoring also led to the discovery of some old Java 8 syntax that could be
cleaned up, and it led to the discovery of some methods that were no longer
being used and could be cleaned up. Finally, in testing, I found that when a
Stateless Group was scheduled, it scheduled the triggering of the stateless
group before marking the state as RUNNING; as a result, the second thread could
run, determine that the state is STARTING instead of RUNNING, and return
without triggering the stateless group. This was addressed by ensuring that we
set the state to RUNNING before triggering the stateless group to be triggered.
Signed-off-by: David Handermann <[email protected]>
> Many duplicates for Provenance Events when running Stateless flow with
> multiple threads
> ---------------------------------------------------------------------------------------
>
> Key: NIFI-13929
> URL: https://issues.apache.org/jira/browse/NIFI-13929
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Critical
> Attachments: events.png, flow.png
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> When running a flow using the Stateless Execution Engine, we see it making a
> huge number of duplicates in the provenance repository. This results in poor
> performance, as well as very incorrect Provenance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)