[jira] [Commented] (SOLR-13781) TestContainerReqHandler.testPackageAPI failures imply race condition between update-package and delete-requesthandler

ASF subversion and git services (Jira) Thu, 19 Sep 2019 17:15:15 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933874#comment-16933874
 ]


ASF subversion and git services commented on SOLR-13781:
--------------------------------------------------------

Commit 5a01a8b3622cf7547e71fa43d88235aeb18defa4 in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5a01a8b ]

SOLR-13781: AwaitsFix TestContainerReqHandler.testPackageAPI


> TestContainerReqHandler.testPackageAPI failures imply race condition between 
> update-package and delete-requesthandler
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13781
>                 URL: https://issues.apache.org/jira/browse/SOLR-13781
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: apache_Lucene-Solr-Tests-8.x_587.testPackageAPI.txt, 
> egrep-out.apache_Lucene-Solr-Tests-8.x_587.testPackageAPI.txt, 
> egrep-out.local.log.txt, local.log.txt
>
>
> We're seeing roughly an 8% failure rate from 
> {{TestContainerReqHandler.testPackageAPI}} with failures occuring on both 
> master and branch_8x, and on various jenkins servers and various OSes.
> All of the failures occur at the same place: A V2 request to {{/node/ext}} to 
> verify that that the {{requestHandler}} List is empty after issuing a 
> {{delete-requesthandler: 'bar'}} payload to the {{/cluster}} API. The logs 
> and failure message indicate that the {{'bar'}} request handler still exists 
> even the assertion does a "sleep/retry" of the verification query 10 times.
> While i don't fully understand this test, or the underlying code being 
> tested, i spent a little time digging into the logs from some of these 
> jenkins failures, and comparing them to the logs i see generated when i get a 
> successful test run locally, and I think what's happening here - and the 
> reason that {{delete-requesthandler}} seems to "fail" frequently in this test 
> method, but not in {{testSetClusterReqHandler}} - is because the prior 
> {{update-package}} command is still in process.
> After the test code runs an {{update-package}} command, the test executes 
> requests against {{/node/ext/bar}} to verify that the {{version}} has changed 
> as a result of updating the package, but i suspect this is only looking at 
> the _metadata_ that has changed as a result of the {{update-package}} command 
> and not actaully ensuring that the request handler has fully loaded - because 
> the logs when this test fails seem to show that the zkCallback threads kicked 
> off by {{update-package}} command are still running when the zkCallback 
> threads kicked off by the subsequent {{delete-requesthandler}} command are 
> running, and finish *after* them, "re-registering" the handler that was just 
> deleted.
> ----
> It's not 100% clear to me if this is _just_ a test bug - and it should be 
> monitoring something else to know when the request handler's a finished 
> loading - or if this indicates a broader flaw in the design of how commands 
> like {{add-package}} / {{update-package}} / {{add-requesthandler}} / 
> {{delete-requesthandler}} should interact if/when they occur in close 
> temporal proximity.
> (ie: if there are zkCallback watchers loading classes and initializing 
> objects as a result of cluster property changes, shouldn't there be some sort 
> of lineraization/synchronization logic to ensure that they get executed in 
> the same order on all the nodes in the cluster?)
> ----
> More detail and log file attachments to follow...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13781) TestContainerReqHandler.testPackageAPI failures imply race condition between update-package and delete-requesthandler

Reply via email to