Mark Payne created NIFI-3152: -------------------------------- Summary: If Provenance Repository runs out of disk space, it may not recover even when disk space is freed up Key: NIFI-3152 URL: https://issues.apache.org/jira/browse/NIFI-3152 Project: Apache NiFi Issue Type: Bug Reporter: Mark Payne Assignee: Mark Payne Fix For: 1.1.1
If we run out of disk space in the provenance repository, we can sometimes get into a situation where the logs show us still waiting for the repo to roll over, even after disk space is freed up. A thread dump shows that the processors are trying to force the repo to rollover. However, the rollover never completes because we can't create an IndexWriter: {code} "Provenance Repository Rollover Thread-1" Id=128 TIMED_WAITING on null at java.lang.Thread.sleep(Native Method) at org.apache.lucene.store.Lock.obtain(Lock.java:92) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:755) at org.apache.nifi.provenance.lucene.SimpleIndexManager.borrowIndexWriter(SimpleIndexManager.java:104) - waiting on org.apache.nifi.provenance.lucene.SimpleIndexManager@22f9da45 at org.apache.nifi.provenance.PersistentProvenanceRepository.mergeJournals(PersistentProvenanceRepository.java:1711) at org.apache.nifi.provenance.PersistentProvenanceRepository$8.run(PersistentProvenanceRepository.java:1311) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Number of Locked Synchronizers: 1 - java.util.concurrent.ThreadPoolExecutor$Worker@850f87f {code} The Index Writer is blocking on a lock, waiting to obtain a write lock for the Directory. Digging around, I believe the issue is that if we call SimpleIndexManager.returnIndexWriter, it will call IndexWriter.commit(). But if that throws an Exception, we don't properly close the writer. If we are running out of disk space, it is likely that we will throw an Exception on IndexWriter.commit() so this appears to be the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)