OK, in working on SOLR-4196 I'm exercising opening/closing cores as never
before. I have a little stress program that does about the worst thing
possible, essentially opens and closes a core for every request. It has a
bunch of query and update threads running simultaneously that pick a random
core and do a query or update. I've got a bunch of code that, I think,
prevents any attempt to open _or_ close a core while it is being either
opened or closed by another thread (but I'm verifying).

It runs fine for a couple of hours, then hits a race condition. I was able
to get a stack trace (see below).

CloserThread.run(CoreContainer.java:1920) (second thread below) is, indeed,
new code. The stress-test program is updating cores (which may not be
loaded) like crazy and doing queries on other random cores. It's perfectly
possible to be updating a core that's in the process of being closed, I was
counting on the ref counting to make this OK... The cores are transient in
a limited cache, so they come and go. It looks like I'm trying to close a
core at the same time an update has come in, but I'm not sure whether this
is something that should be prevented from the new code or is an underlying
problem.

So a couple of questions:
1> SOLR-4196 has a whole series of improvements that even let us get here.
Running the stress test program against current trunk barfs before having
time to hit this condition, so the current state is an improvement. What do
you think about me checking 4196 in and opening a separate JIRA for this
issue?

2> Any suggestions on what direction to go next? If it's something easy, I
can just fold it into this patch.

3> Am I just going about things bass-ackwards? Not an unusual state of
affairs unfortunately.....

NOTE: The current patch for SOLR-4196 isn't the one running with this code,
there are a couple more things I want change. Mostly I'm asking if someone
familiar with the code where the race is encountered has a quick fix....

Thanks,
Erick


Found one Java-level deadlock:
=============================
"commitScheduler-122579-thread-1":
  waiting to lock monitor 7f87c3076d38 (object 78b379a28, a
org.apache.solr.update.DefaultSolrCoreState),
  which is held by "Thread-15"
"Thread-15":
  waiting for ownable synchronizer 765e84638, (a
java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "commitScheduler-122579-thread-1"

Java stack information for the threads listed above:
===================================================
"commitScheduler-122579-thread-1":
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:82)
- waiting to lock <78b379a28> (a
org.apache.solr.update.DefaultSolrCoreState)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1354)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:573)
- locked <76aa46f58> (a java.lang.Object)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
 "Thread-15":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <765e84638> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:680)
at
org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:68)
- locked <78b379a28> (a org.apache.solr.update.DefaultSolrCoreState)
at
org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:289)
- locked <78b379a28> (a org.apache.solr.update.DefaultSolrCoreState)
at
org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:68)
- locked <78b379a28> (a org.apache.solr.update.DefaultSolrCoreState)
at org.apache.solr.core.SolrCore.close(SolrCore.java:975)
at org.apache.solr.core.CloserThread.run(CoreContainer.java:1920)

Found 1 deadlock.

Reply via email to