[ 
https://issues.apache.org/jira/browse/SOLR-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943010#comment-16943010
 ] 

ASF subversion and git services commented on SOLR-13811:
--------------------------------------------------------

Commit 18bf61504fbd9d8becff1a572642b4207dc7d54c in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=18bf615 ]

SOLR-13811: Refactor AutoAddReplicasIntegrationTest to isolate problematic 
situation into an AwaitsFix test method

(cherry picked from commit a57ec148e52507104fdf0f99381d2b485fa846fc)


> possible autoAddReplicas bug and/or (Hdfs)AutoAddReplicasIntegrationTest 
> refactoring / fixes
> --------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13811
>                 URL: https://issues.apache.org/jira/browse/SOLR-13811
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> I've noticed a pattern of failure behavior in jenkins runs of 
> {{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass 
> {{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which 
> indicates either:
>  # the test is too contrived, and expects {{autoAddReplicas}} to kick in in a 
> situation where the current impl of {{NodeLostTrigger}} isn't smart enough to 
> handle
>  # {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't.
> The test failure is currently somewhat finicky to reproduce, and depends on a 
> node being stoped, restarted, and stopped again – while an affected 
> collection is changed from {{autoAddReplicas=false}} to 
> {{autoAddReplicas=true}} before the second "stop"
> Regardless of which of the 2 above is true: the test itself is somewhat 
> convoluted. It creates a sequence of events (some randomized, some static) 
> and asserting specific outcomes after each – but the timing of scheduled 
> triggers like {{NodeLostTrigger}} , and the interplay of things like "pick a 
> random node to shutdown" with a subsequent "explicitly shut down node2" (even 
> if it was the node randomly shut down earlier) is confusing.
> I'm creating this issue to track two tightly dependent objectives:
>  # refactoring this test to:
>  ** better isolate the specific things it's trying to test in individual test 
> methods.
>  ** have a singular test method that triggers the specific sequence of events 
> that is currently problematic (ideally in such a way that it reliably fails).
>  # AwaitsFix this new test method until someone with a better understand of 
> the {{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is 
> faulty or the code being tested is faulty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to