[ 
https://issues.apache.org/jira/browse/SOLR-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559516#comment-14559516
 ] 

Timothy Potter commented on SOLR-7587:
--------------------------------------

Have a little more information about what caused this failure. Had to dig into 
the JavaDoc for ReentrantReadWriteLock a bit and found this little gem:

{quote}
Reentrancy also allows downgrading from the write lock to a read lock, by 
acquiring the write lock, then the read lock and then releasing the write lock. 
However, upgrading from a read lock to the write lock is not possible.
{quote}

All the test failures because of this situation occurred during a commit. 
Commits acquire a read-lock on the VersionInfo object (see 
{{DistributedUpdateProcessor#versionAdd}} method). My code introduced the need 
for acquiring the write-lock and as we learned above, you can't upgrade a 
read-lock to a write-lock. The problem is where I had this code; specifically I 
hung it off of the code that handles {{firstSearcher}} events, since I need a 
searcher in order to lookup the max value from the index to seed version 
buckets with. But all this seems like the test should fail consistently every 
time, which is not the case. So clearly there's some timing involved with this 
fail. This code only fires when {{currSearcher == null}} and I don't get how 
that could be at the point where the test is sending a commit (see below)?

{code}
        at org.apache.solr.update.VersionInfo.blockUpdates(VersionInfo.java:118)
        at org.apache.solr.update.UpdateLog.onFirstSearcher(UpdateLog.java:1604)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1810)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1505)
        at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:617)
        - locked <0x00000000f6f09a10> (a java.lang.Object)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
        at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
        at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
        at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2051)
        at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:179)
        at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
        at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
        at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:502)
        at 
org.apache.solr.client.solrj.response.TestSpellCheckResponse.testSpellCheckResponse(TestSpellCheckResponse.java:51)
{code}

The searcher gets registered in futures but seems unlikely that the test should 
get this far before the searcher opened during core initialization is set to 
the currSearcher. At any rate, the patch I submitted moves the bucket seeding 
code (which needs a write-lock) out of the firstSearcher code path and into the 
SolrCore ctor, which fixes the issue of needing a write-lock when a read-lock 
as already been acquired for a commit operation. It's still a question in my 
mind as to how the test can get to sending a commit when {{currSearcher == 
null}} ... any thoughts on that?

> TestSpellCheckResponse stalled and never timed out -- possible VersionBucket 
> bug? (5.2 branch)
> ----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7587
>                 URL: https://issues.apache.org/jira/browse/SOLR-7587
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Timothy Potter
>            Priority: Blocker
>             Fix For: 5.2
>
>         Attachments: SOLR-7587.patch, jstack.1.txt, jstack.2.txt, 
> junit4-J0-20150522_181244_599.events, junit4-J0-20150522_181244_599.spill, 
> junit4-J0-20150522_181244_599.suites
>
>
> On the 5.2 branch (r1681250), I encountered a solrj test stalled for over 110 
> minutes before i finally killed it...
> {noformat}
>    [junit4] Suite: org.apache.solr.common.util.TestRetryUtil
>    [junit4] Completed [55/60] on J1 in 1.04s, 1 test
>    [junit4] 
>    [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T18:14:56, stalled for  
> 121s at: TestSpellCheckResponse.testSpellCheckResponse
>    [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T18:15:56, stalled for  
> 181s at: TestSpellCheckResponse.testSpellCheckResponse
> ...
>    [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T20:00:56, stalled for 
> 6481s at: TestSpellCheckResponse.testSpellCheckResponse
>    [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T20:01:56, stalled for 
> 6541s at: TestSpellCheckResponse.testSpellCheckResponse
>    [junit4] HEARTBEAT J0 PID(12147@tray): 2015-05-22T20:02:56, stalled for 
> 6601s at: TestSpellCheckResponse.testSpellCheckResponse
> {noformat}
> I'll attach some jstack output as well as all the temp files from the J0 
> runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to