[ 
https://issues.apache.org/jira/browse/LUCENE-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4158:
------------------------------------

    Attachment: LUCENE-4158.patch

this must be 'lucky monday'. I mean I really enjoy starting the week with a 
sneaky concurrency problem but it's even better if you can reproduce it! Well, 
it came even better... I reproduced the hang with IW output enabled and a debug 
pointer in place! How lucky is that! Good news, I know what happens and we can 
easily fix it. 

Here we go.... some thread triggers a full flush which means we take all DWPT 
out of the loop and flush them to do a reopen. Yet, at the same time we need to 
prevent later documents from sneaking into the flush. That is done by 
preventing any DWPT from flushing which was not part of the full flush until 
that is done! Yet if we pile up lots of docs (or have a smallish ram buffer / 
low max buffered docs) this can happen easily. Now if we reach the threashold 
for stalling while the full flush is flushing its last segment we check if we 
can help flushing with indexing threads. That is not possible since all DWPT 
that don't belong to the full flush are blocked. So we basically can not help 
and are over the stall limit, so we put ourself to sleep. Now the full flush is 
done and it gives us the wakeup call but nobody else will ever flush one of the 
blocked DWPT since all threads are waiting. 

basically a chicken/egg problem.. this patch simply removes the loop in 
DWStallControl and lets the higher level logic restall after we retried to 
flush pending DWPT instances. ie healing ourself.
                
> Windows Tests (4.x) hanging for 5 hrs in stall control again
> ------------------------------------------------------------
>
>                 Key: LUCENE-4158
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4158
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0, 5.0
>            Reporter: Uwe Schindler
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.0, 5.0
>
>         Attachments: LUCENE-4158.patch, LUCENE-4158.patch
>
>
> Here the stack dump, from build (extracted with jstack on windows): 
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java6-64/118/
> JVM is 1.6.0_32, 64bit, server, 2 CPUs, Windows 7 64
> {noformat}
> 2012-06-19 17:21:42
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.7-b02 mixed mode):
> "Thread 0" prio=6 tid=0x00000000070dc000 nid=0xcec waiting on condition 
> [0x000000000e18f000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000000f684e8d8> (a 
> org.apache.lucene.index.DocumentsWriterStallControl$Sync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>       at 
> org.apache.lucene.index.DocumentsWriterStallControl.waitIfStalled(DocumentsWriterStallControl.java:120)
>       at 
> org.apache.lucene.index.DocumentsWriterFlushControl.waitIfStalled(DocumentsWriterFlushControl.java:618)
>       at 
> org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:301)
>       at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:361)
>       at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1327)
>       at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1078)
>       at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1059)
>       at 
> org.apache.lucene.index.TestNRTReaderWithThreads$RunThread.run(TestNRTReaderWithThreads.java:94)
>    Locked ownable synchronizers:
>       - None
> "TEST-TestScope-org.apache.lucene.index.TestNRTReaderWithThreads.testIndexing-seed#[641A6B3E2297F46E]"
>  prio=6 tid=0x00000000070d9800 nid=0x448 in Object.wait() [0x00000000058de000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000000f685b540> (a 
> org.apache.lucene.index.TestNRTReaderWithThreads$RunThread)
>       at java.lang.Thread.join(Thread.java:1186)
>       - locked <0x00000000f685b540> (a 
> org.apache.lucene.index.TestNRTReaderWithThreads$RunThread)
>       at java.lang.Thread.join(Thread.java:1239)
>       at 
> org.apache.lucene.index.TestNRTReaderWithThreads.testIndexing(TestNRTReaderWithThreads.java:61)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>       at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>       at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>       at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>       at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>       at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>       at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>       at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>       at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>       at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>       at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>       at 
> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
>       at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>       at 
> org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
>       at 
> org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
>       at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
>       at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>       at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
>    Locked ownable synchronizers:
>       - None
> "Low Memory Detector" daemon prio=6 tid=0x000000000492f000 nid=0xfe4 runnable 
> [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "C2 CompilerThread1" daemon prio=10 tid=0x000000000492c000 nid=0x6b0 waiting 
> on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "C2 CompilerThread0" daemon prio=10 tid=0x00000000002ca800 nid=0xf1c waiting 
> on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "Attach Listener" daemon prio=10 tid=0x00000000002c9000 nid=0xcc8 waiting on 
> condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "Signal Dispatcher" daemon prio=10 tid=0x0000000004920800 nid=0x91c runnable 
> [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "Finalizer" daemon prio=8 tid=0x00000000002b5000 nid=0xf4c in Object.wait() 
> [0x00000000048df000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000000e01873c8> (a java.lang.ref.ReferenceQueue$Lock)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
>       - locked <0x00000000e01873c8> (a java.lang.ref.ReferenceQueue$Lock)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
>       at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>    Locked ownable synchronizers:
>       - None
> "Reference Handler" daemon prio=10 tid=0x00000000002ac000 nid=0xe10 in 
> Object.wait() [0x00000000047df000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000000e0156290> (a java.lang.ref.Reference$Lock)
>       at java.lang.Object.wait(Object.java:485)
>       at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>       - locked <0x00000000e0156290> (a java.lang.ref.Reference$Lock)
>    Locked ownable synchronizers:
>       - None
> "main" prio=6 tid=0x00000000003ae000 nid=0x558 in Object.wait() 
> [0x0000000000e0f000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000000f67e96b8> (a 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2)
>       at java.lang.Thread.join(Thread.java:1186)
>       - locked <0x00000000f67e96b8> (a 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2)
>       at java.lang.Thread.join(Thread.java:1239)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:561)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.run(RandomizedRunner.java:521)
>       at 
> com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:145)
>       at 
> com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:238)
>       at 
> com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:12)
>    Locked ownable synchronizers:
>       - None
> "VM Thread" prio=10 tid=0x00000000002a3800 nid=0x614 runnable 
> "GC task thread#0 (ParallelGC)" prio=6 tid=0x0000000000203000 nid=0x7dc 
> runnable 
> "GC task thread#1 (ParallelGC)" prio=6 tid=0x0000000000205000 nid=0xc20 
> runnable 
> "VM Periodic Task Thread" prio=10 tid=0x0000000004938000 nid=0x720 waiting on 
> condition 
> JNI global references: 1586
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to