[
https://issues.apache.org/jira/browse/NIFI-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527234#comment-15527234
]
ASF GitHub Bot commented on NIFI-2395:
--------------------------------------
GitHub user mosermw opened a pull request:
https://github.com/apache/nifi/pull/1072
Nifi 2429 PersistentProvenanceRepository bug fixes
In this PR I cherry-picked these commits from master into 0.x
cfc8a9613cb071247ef22f8fe4a3abb4e6b83151 NIFI-2395
PersistentProvenanceRepository deadlock on journal merge and index exception
e9b87dd73436b1659b1fddcc400e7248bc00f1ee NIFI-2452
PersistentProvenanceRepository index readers can be prematurely closed
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mosermw/nifi NIFI-2429
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/1072.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1072
----
commit 29460a724f583119eada146661ab654fc961c185
Author: Mark Payne <[email protected]>
Date: 2016-07-28T14:19:45Z
NIFI-2395 This closes #734. Ensure that if we fail to index provenance
events we do not prevent the repo from continuing to merge journals
commit bf8d66566c8eee911aea48b0b97942500851cf2c
Author: Mike Moser <[email protected]>
Date: 2016-09-26T20:22:50Z
NIFI-2429 changes needed after cherry-picking NIFI-2395 from master
commit fba761508d0b1fd39e8d27e3e80a6d6e8e22c0cc
Author: Mark Payne <[email protected]>
Date: 2016-08-01T18:51:02Z
NIFI-2452: Ensure that we do not close Index Readers that are still in use
----
> PersistentProvenanceRepository Deadlocks caused by a blocked journal merge
> --------------------------------------------------------------------------
>
> Key: NIFI-2395
> URL: https://issues.apache.org/jira/browse/NIFI-2395
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 0.6.0, 0.7.0
> Reporter: Brian Davis
> Assignee: Joseph Witt
> Priority: Blocker
> Fix For: 1.0.0, 1.0.0-Beta
>
>
> I have a nifi instance that I have been running for about a week and has
> deadlocked at least 3 times during this time. When I say deadlock the whole
> nifi instance stops doing any progress on flowfiles. I looked at the stack
> trace and there are a lot of threads stuck doing tasks in the
> PersistentProvenanceRepository. Looking at the code I think this is what is
> happening:
> There is a ReadWriteLock that all the reads are waiting for a write. The
> write is in the loop:
> {code}
> while (journalFileCount > journalCountThreshold || repoSize >
> sizeThreshold) {
> // if a shutdown happens while we are in this loop, kill
> the rollover thread and break
> if (this.closed.get()) {
> if (future != null) {
> future.cancel(true);
> }
> break;
> }
> if (repoSize > sizeThreshold) {
> logger.debug("Provenance Repository has exceeded its
> size threshold; will trigger purging of oldest events");
> purgeOldEvents();
> journalFileCount = getJournalCount();
> repoSize = getSize(getLogFiles(), 0L);
> continue;
> } else {
> // if we are constrained by the number of journal
> files rather than the size of the repo,
> // then we will just sleep a bit because another
> thread is already actively merging the journals,
> // due to the runnable that we scheduled above
> try {
> Thread.sleep(100L);
> } catch (final InterruptedException ie) {
> }
> }
> logger.debug("Provenance Repository is still behind.
> Keeping flow slowed down "
> + "to accommodate. Currently, there are {}
> journal files ({} bytes) and "
> + "threshold for blocking is {} ({} bytes)",
> journalFileCount, repoSize, journalCountThreshold, sizeThreshold);
> journalFileCount = getJournalCount();
> repoSize = getSize(getLogFiles(), 0L);
> }
> logger.info("Provenance Repository has now caught up with
> rolling over journal files. Current number of "
> + "journal files to be rolled over is {}",
> journalFileCount);
> }
> {code}
> My nifi is at the sleep indefinitely. The reason my nifi cannot move forward
> is because of the thread doing the merge is stopped. The thread doing the
> merge is at:
> {code}
> accepted = eventQueue.offer(new Tuple<>(record, blockIndex), 10,
> TimeUnit.MILLISECONDS);
> {code}
> so the queue is full.
> What I believe happened is that the callables created here:
> {code}
> final Callable<Object> callable = new
> Callable<Object>() {
> @Override
> public Object call() throws IOException {
> while (!eventQueue.isEmpty() ||
> !finishedAdding.get()) {
> final
> Tuple<StandardProvenanceEventRecord, Integer> tuple;
> try {
> tuple = eventQueue.poll(10,
> TimeUnit.MILLISECONDS);
> } catch (final InterruptedException
> ie) {
> continue;
> }
> if (tuple == null) {
> continue;
> }
> indexingAction.index(tuple.getKey(),
> indexWriter, tuple.getValue());
> }
> return null;
> }
> {code}
> finish before the offer adds its first event because I do not see any Index
> Provenance Events threads. My guess is the while loop condition is wrong and
> should be && instead of ||.
> I upped the thread count for the index creation from 1 to 3 to see if that
> helps. I can tell you if that helps later this week.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)