[ 
https://issues.apache.org/jira/browse/SOLR-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705052#comment-14705052
 ] 

Yonik Seeley commented on SOLR-7836:
------------------------------------

bq. pulls out the problematic open searcher in ulog.add to a separate method.

There are a few areas with complex synchronization that should not be changed 
unless one is confident about understanding why all the synchronization was 
there in the first place.  Having the tests pass isn't a high enough bar for 
these areas because of the difficulty in actually getting a test to expose 
subtle race conditions or thread safety issues.  This comes back to my original 
"get it back in my head" - I don't fee comfortable messing with this stuff 
either until I've really internalized the bigger picture again... and it 
doesn't last ;-)

For the specific case above, one can't just take what was one synchronized 
block and break it up into two.  It certainly creates race conditions and 
breaks the invariants we try to keep.  The specific invariant here is that if 
it's not in the tlog maps, then it is guaranteed to be in the realtime reader.  
Hopefully some of our tests would fail with this latest patch... but it's hard 
stuff to test.

I worked up a patch that passed down the IndexWriter (it needs to be passed 
*all* the way down to SolrCore.openSearcher to actually avoid deadlocks).  That 
ended up changing more code than I'd like... so now I'm working up a patch to 
make IW locking re-entrant.  That approach should be less fragile going forward 
(i.e. less likely to easily introduce a deadlock through seemingly unrelated 
changes).

> Possible deadlock when closing refcounted index writers.
> --------------------------------------------------------
>
>                 Key: SOLR-7836
>                 URL: https://issues.apache.org/jira/browse/SOLR-7836
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>             Fix For: Trunk, 5.4
>
>         Attachments: SOLR-7836-reorg.patch, SOLR-7836-synch.patch, 
> SOLR-7836.patch, SOLR-7836.patch, SOLR-7836.patch, deadlock_3.res.zip, 
> deadlock_5_pass_iw.res.zip, deadlock_test
>
>
> Preliminary patch for what looks like a possible race condition between 
> writerFree and pauseWriter in DefaultSorlCoreState.
> Looking for comments and/or why I'm completely missing the boat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to