[ https://issues.apache.org/jira/browse/HDDS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaoyu Yao updated HDDS-2214: ----------------------------- Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~elek] for the contribution and all for the reviews. I've merged the change to master. > TestSCMContainerPlacementRackAware has an intermittent failure > -------------------------------------------------------------- > > Key: HDDS-2214 > URL: https://issues.apache.org/jira/browse/HDDS-2214 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Reporter: Marton Elek > Assignee: Marton Elek > Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > For example from the nightly build: > {code:java} > <testcase name="testNoFallback[8]" > classname="org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware" > time="0.014"> > > > <failure type="java.lang.AssertionError">java.lang.AssertionError > > > at org.junit.Assert.fail(Assert.java:86) > > > at org.junit.Assert.assertTrue(Assert.java:41) > > > at org.junit.Assert.assertTrue(Assert.java:52) > > > at > org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware.testNoFallback(TestSCMContainerPlacementRackAware.java:276) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > at java.lang.reflect.Method.invoke(Method.java:498) > > > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > {code} > The problem is in the testNoFallback: > Let's say we have 11 nodes (from parameter) and we would like to choose 5 > nodes (hard coded in the test). > As the first two replicas are chosen from the same rack an all the other from > different racks it's not possible, so we except a failure. > But we have an assertion that the success count is at least 3. But this is > true only if the first two replicas are placed to the rack1 (5 nodes) or > rack2 (5nodes). If the replica is placed to the rack3 (one node) it will fail > immediately: > > Lucky case when we have success count > 3 > {code:java} > rack1 -- node1 > rack1 -- node2 -- FIRST replica > rack1 -- node3 -- SECOND replica > rack1 -- node4 > rack1 -- node5 > rack2 -- node6 > rack2 -- node7 -- THIRD replica > rack2 -- node8 > rack2 -- node9 > rack2 -- node10 > rack3 -- node11 -- FOURTH replica{code} > The specific case when we have success count == 1, as we can't choose the > second replica on rack3 (This is when the test is failing) > {code:java} > rack1 -- node1 > rack1 -- node2 > rack1 -- node3 > rack1 -- node4 > rack1 -- node5 > rack2 -- node6 > rack2 -- node7 > rack2 -- node8 > rack2 -- node9 > rack2 -- node10 > rack3 -- node11 -- FIRST replica{code} > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org