Mark Payne created NIFI-3152:
--------------------------------
Summary: If Provenance Repository runs out of disk space, it may
not recover even when disk space is freed up
Key: NIFI-3152
URL: https://issues.apache.org/jira/browse/NIFI-3152
Project: Apache NiFi
Issue Type: Bug
Reporter: Mark Payne
Assignee: Mark Payne
Fix For: 1.1.1
If we run out of disk space in the provenance repository, we can sometimes get
into a situation where the logs show us still waiting for the repo to roll
over, even after disk space is freed up. A thread dump shows that the
processors are trying to force the repo to rollover. However, the rollover
never completes because we can't create an IndexWriter:
{code}
"Provenance Repository Rollover Thread-1" Id=128 TIMED_WAITING on null
at java.lang.Thread.sleep(Native Method)
at org.apache.lucene.store.Lock.obtain(Lock.java:92)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:755)
at
org.apache.nifi.provenance.lucene.SimpleIndexManager.borrowIndexWriter(SimpleIndexManager.java:104)
- waiting on
org.apache.nifi.provenance.lucene.SimpleIndexManager@22f9da45
at
org.apache.nifi.provenance.PersistentProvenanceRepository.mergeJournals(PersistentProvenanceRepository.java:1711)
at
org.apache.nifi.provenance.PersistentProvenanceRepository$8.run(PersistentProvenanceRepository.java:1311)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Number of Locked Synchronizers: 1
- java.util.concurrent.ThreadPoolExecutor$Worker@850f87f
{code}
The Index Writer is blocking on a lock, waiting to obtain a write lock for the
Directory.
Digging around, I believe the issue is that if we call
SimpleIndexManager.returnIndexWriter, it will call IndexWriter.commit(). But if
that throws an Exception, we don't properly close the writer. If we are running
out of disk space, it is likely that we will throw an Exception on
IndexWriter.commit() so this appears to be the root cause.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)