Jessica Cheng created LUCENE-4989:
-------------------------------------

             Summary: Hanging on DocumentsWriterStallControl.waitIfStalled 
forever
                 Key: LUCENE-4989
                 URL: https://issues.apache.org/jira/browse/LUCENE-4989
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
    Affects Versions: 4.1
         Environment: Linux 2.6.32
            Reporter: Jessica Cheng


In an environment where our underlying storage was timing out on various 
operations, we find all of our indexing threads eventually stuck in the 
following state (so far for 4 days):

"Thread-0" daemon prio=5 Thread id=556  WAITING
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at 
org.apache.lucene.index.DocumentsWriterStallControl.waitIfStalled(DocumentsWriterStallControl.java:74)
        at 
org.apache.lucene.index.DocumentsWriterFlushControl.waitIfStalled(DocumentsWriterFlushControl.java:676)
        at 
org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:301)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:361)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1484)
        at ...

I have not yet enabled detail logging and tried to reproduce yet, but looking 
at the code, I see that DWFC.abortPendingFlushes does

        try {
          dwpt.abort();
          doAfterFlush(dwpt);
        } catch (Throwable ex) {
          // ignore - keep on aborting the flush queue
        }

(and the same for the blocked ones). Since the throwable is ignored, I can't 
say for sure, but I've seen DWPT.abort thrown in other cases, so if it does 
throw, we'd fail to call doAfterFlush and properly decrement flushBytes. This 
can be a problem, right? Is it possible to do this instead:

        try {
          dwpt.abort();
        } catch (Throwable ex) {
          // ignore - keep on aborting the flush queue
        } finally {
          try {
            doAfterFlush(dwpt);
          } catch (Throwable ex2) {
            // ignore - keep on aborting the flush queue
          }
        }

It's ugly but safer. Otherwise, maybe at least add logging for the throwable 
just to make sure this is/isn't happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to