[
https://issues.apache.org/jira/browse/HDDS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-2214:
---------------------------------
Labels: pull-request-available (was: )
> TestSCMContainerPlacementRackAware has an intermittent failure
> --------------------------------------------------------------
>
> Key: HDDS-2214
> URL: https://issues.apache.org/jira/browse/HDDS-2214
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: Marton Elek
> Assignee: Marton Elek
> Priority: Major
> Labels: pull-request-available
>
> For example from the nightly build:
> {code:java}
> <testcase name="testNoFallback[8]"
> classname="org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware"
> time="0.014">
>
>
> <failure type="java.lang.AssertionError">java.lang.AssertionError
>
>
> at org.junit.Assert.fail(Assert.java:86)
>
>
> at org.junit.Assert.assertTrue(Assert.java:41)
>
>
> at org.junit.Assert.assertTrue(Assert.java:52)
>
>
> at
> org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware.testNoFallback(TestSCMContainerPlacementRackAware.java:276)
>
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
>
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> {code}
> The problem is in the testNoFallback:
> Let's say we have 11 nodes (from parameter) and we would like to choose 5
> nodes (hard coded in the test).
> As the first two replicas are chosen from the same rack an all the other from
> different racks it's not possible, so we except a failure.
> But we have an assertion that the success count is at least 3. But this is
> true only if the first two replicas are placed to the rack1 (5 nodes) or
> rack2 (5nodes). If the replica is placed to the rack3 (one node) it will fail
> immediately:
>
> Lucky case when we have success count > 3
> {code:java}
> rack1 -- node1
> rack1 -- node2 -- FIRST replica
> rack1 -- node3 -- SECOND replica
> rack1 -- node4
> rack1 -- node5
> rack2 -- node6
> rack2 -- node7 -- THIRD replica
> rack2 -- node8
> rack2 -- node9
> rack2 -- node10
> rack3 -- node11 -- FOURTH replica{code}
> The specific case when we have success count == 1, as we can't choose the
> second replica on rack3 (This is when the test is failing)
> {code:java}
> rack1 -- node1
> rack1 -- node2
> rack1 -- node3
> rack1 -- node4
> rack1 -- node5
> rack2 -- node6
> rack2 -- node7
> rack2 -- node8
> rack2 -- node9
> rack2 -- node10
> rack3 -- node11 -- FIRST replica{code}
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]