[ https://issues.apache.org/jira/browse/HDFS-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978452#comment-14978452 ]
Rushabh S Shah commented on HDFS-9083: -------------------------------------- [~jingzhao] [~mingma] [~brahmareddy]: Thanks for the reviews. I ran all the hdfs tests since jenkins failed to run the tests. The following tests failed: {noformat} TestSecureNNWithQJM#testSecureMode TestSecureNNWithQJM#testSecondaryNameNodeHttpAddressNotNeeded TestAppendSnapshotTruncate#testAST TestBalancer#testTwoReplicaShouldNotInSameDN TestBalancer#testBalancerWithPinnedBlocks TestBalancer#testBalancerWithZeroThreadsForMove TestBalancerWithSaslDataTransfer#testBalancer0Integrity TestBalancerWithSaslDataTransfer#testBalancer0Authentication TestBalancerWithSaslDataTransfer#testBalancer0Privacy TestBalancerWithNodeGroup#testBalancerWithNodeGroup TestBalancerWithNodeGroup#testBalancerEndInNoMoveProgress TestSaslDataTransfer#testServerSaslNoClientSasl TestSaslDataTransfer#testClientAndServerDoNotHaveCommonQop TestSaslDataTransfer#testAuthentication TestSaslDataTransfer#testPrivacy TestSaslDataTransfer#testNoSaslAndSecurePortsIgnored TestSaslDataTransfer#testIntegrity {noformat} I ran all these tests multiple times. All these tests failed always except TestAppendSnapshotTruncate#testAST, which failed intermittently. I ran all the failed tests without my patch also and they failed. So none of the test failures are related to my patch. I will start the test-patch.sh on my machine and upload the results shortly. > Replication violates block placement policy. > -------------------------------------------- > > Key: HDFS-9083 > URL: https://issues.apache.org/jira/browse/HDFS-9083 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS, namenode > Affects Versions: 2.6.0 > Reporter: Rushabh S Shah > Assignee: Rushabh S Shah > Priority: Blocker > Attachments: HDFS-9083-branch-2.7.patch > > > Recently we are noticing many cases in which all the replica of the block are > residing on the same rack. > During the block creation, the block placement policy was honored. > But after node failure event in some specific manner, the block ends up in > such state. > On investigating more I found out that BlockManager#blockHasEnoughRacks is > dependent on the config (net.topology.script.file.name) > {noformat} > if (!this.shouldCheckForEnoughRacks) { > return true; > } > {noformat} > We specify DNSToSwitchMapping implementation (our own custom implementation) > via net.topology.node.switch.mapping.impl and no longer use > net.topology.script.file.name config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)