[ 
https://issues.apache.org/jira/browse/SOLR-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4629:
---------------------------

    Attachment: 
SOLR-4629_emptycommittest_and_numfoundrefactor_and_waitparam.patch


After adding a lot of debug logs, and walking through the results of lots of 
failed tests compared to successful tests, and a lot of vigorous, physical 
consultation between my forehead and my desk i think i've finally tracked down 
the cause of all the "expected 2 got 3" failures from checkForSingleIndex.

The problem in a nutshell is one of concurrency. When the test thread makes a 
request to the master or to the slave those requests are handled by a jetty 
thread which (via SolrDispatchFilter) creates a SolrQueryRequest, which has a 
searcher ref, which has a Directory ref.  When the request is done, the 
SolrQueryRequest is closed, hich releases the searcher ref, which releases the 
directory ref -- but by the time this happens, the response has already been 
returned to the "client" (the test thread), and the test thread may enter 
checkForSingleIndex to acquire the lock on the CacheDirectoryFactory (to check 
the list of cached paths) before the resources from a previos rquest have been 
completely released -- so the test fails because an Directory from an old 
request still hasn't been released.

Example...

{noformat}
Time   Test-Thread              Jetty-Thread-N
0      http request->jetty
1                               accept http request
2                               create solr query request
3                               incref searcher, incref dir
4                               process solr query request
5                               test thread<-write http response
6      process response
7      ...    
8      assert(2=num dirs)
9                               decref seracher, decref dir, release dir
{noformat}

I think the key change is to modify checkForSingleIndex so that instead of 
asserting exactly 2 paths in the cache, we assert that there are only 2 paths 
that are not "done" -- allowing for the possibility of other paths still being 
tracked because of requests still being closed.


The attached patch makes this change -- there are still some nocommits (in 
particular i completely commented out hte replication core reloading to rule 
that out as a possible cause, but there's also some excessively absurd logging) 
but even if you ignore all that, after replacing 
"CachingDirectoryFactory.getPaths()" with 
"CachingDirectoryFactory.getLivePaths()" I have yet to see "expected:<2> but 
was:<3>" in any test run.  If you tweak that method to eliminate the 
"!val.doneWithDir" dir check, you should start seeing the failures come back.

I'll clean the patch up more tomorow and run some more exhaustive tests to be 
sure i haven't broken anything, but i wanted to post what i had in case i got 
hit by a buss (and to ensure [[email protected]] doesn't see any flaw with 
my "getLivePaths()" change before i get too happy about it)


                
> Stronger standard replication testing.
> --------------------------------------
>
>                 Key: SOLR-4629
>                 URL: https://issues.apache.org/jira/browse/SOLR-4629
>             Project: Solr
>          Issue Type: Test
>          Components: replication (java)
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>             Fix For: 4.3, 5.0, 4.2.1
>
>         Attachments: 
> SOLR-4629_emptycommittest_and_numfoundrefactor_and_waitparam.patch, 
> SOLR-4629_emptycommittest_and_numfoundrefactor_and_waitparam.patch, 
> SOLR-4629_emptycommittest_and_numfoundrefactor_and_waitparam.patch, 
> SOLR-4629_emptycommittest_and_numfoundrefactor_and_waitparam.patch
>
>
> I added to these tests recently, but there is a report on the list indicating 
> we may still be missing something. Most reports have been positive so far 
> after the 4.2 fixes, but I'd feel better after adding some more testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to