Chris M. Hostetter created SOLR-13811:
-----------------------------------------
Summary: possible autoAddReplicas bug and/or
(Hdfs)AutoAddReplicasIntegrationTest refactoring / fixes
Key: SOLR-13811
URL: https://issues.apache.org/jira/browse/SOLR-13811
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter
I've noticed a pattern of failure behavior in jenkins runs of
{{AutoAddReplicasIntegrationTest}} (which mostly manifests in the subclass
{{HdfsAutoAddReplicasIntegrationTest}}, probably due to timing) which indicates
either:
# the test is too contrived, and expects {{autoAddReplicas}} to kick in in a
situation where the current impl of {{NodeLostTrigger}} isn't smart enough to
handle
# {{NodeLostTrigger}} _should_ be smart enough to handle this, but isn't.
The test failure is currently somewhat finicky to reproduce, and depends on a
node being stoped, restarted, and stopped again – while an affected collection
is changed from {{autoAddReplicas=false}} to {{autoAddReplicas=true}} before
the second "stop"
Regardless of which of the 2 above is true: the test itself is somewhat
convoluted. It creates a sequence of events (some randomized, some static) and
asserting specific outcomes after each – but the timing of scheduled triggers
like {{NodeLostTrigger}} , and the interplay of things like "pick a random node
to shutdown" with a subsequent "explicitly shut down node2" (even if it was the
node randomly shut down earlier) is confusing.
I'm creating this issue to track two tightly dependent objectives:
# refactoring this test to:
** better isolate the specific things it's trying to test in individual test
methods.
** have a singular test method that triggers the specific sequence of events
that is currently problematic (ideally in such a way that it reliably fails).
# AwaitsFix this new test method until someone with a better understand of the
{{autoAddReplicas}} / {{NodeLostTrigger}} code can assess if the test is faulty
or the code being tested is faulty.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]