[ https://issues.apache.org/jira/browse/HDFS-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618292#comment-16618292 ]
Shweta edited comment on HDFS-13833 at 9/18/18 12:08 AM: --------------------------------------------------------- Thanks [~knanasi] for reviewing the patch. That's silly of me to not have checked for the package-private before submitting the patch, I have uploade patch with this change. Also, the check style warning were related to the hidden field i.e. the stats object which has been resolved in this patch as I am not passing it as a parameter. was (Author: shwetayakkali): Thanks [~knanasi] for reviewing the patch. That's silly of me to not have checked for the package-private before submitting the patch, will update the patch. Also, the check style warning were related to the hidden field i.e. the stats object which has been resolved in this patch as I am not passing it as a parameter. > Failed to choose from local rack (location = /default); the second replica is > not found, retry choosing ramdomly > ---------------------------------------------------------------------------------------------------------------- > > Key: HDFS-13833 > URL: https://issues.apache.org/jira/browse/HDFS-13833 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Henrique Barros > Assignee: Shweta > Priority: Critical > Attachments: HDFS-13833.001.patch, HDFS-13833.002.patch > > > I'm having a random problem with blocks replication with Hadoop > 2.6.0-cdh5.15.0 > With Cloudera CDH-5.15.0-1.cdh5.15.0.p0.21 > > In my case we are getting this error very randomly (after some hours) and > with only one Datanode (for now, we are trying this cloudera cluster for a > POC) > Here is the Log. > {code:java} > Choosing random from 1 available nodes on node /default, scope=/default, > excludedScope=null, excludeNodes=[] > 2:38:20.527 PM DEBUG NetworkTopology > Choosing random from 0 available nodes on node /default, scope=/default, > excludedScope=null, excludeNodes=[192.168.220.53:50010] > 2:38:20.527 PM DEBUG NetworkTopology > chooseRandom returning null > 2:38:20.527 PM DEBUG BlockPlacementPolicy > [ > Node /default/192.168.220.53:50010 [ > Datanode 192.168.220.53:50010 is not chosen since the node is too busy > (load: 8 > 0.0). > 2:38:20.527 PM DEBUG NetworkTopology > chooseRandom returning 192.168.220.53:50010 > 2:38:20.527 PM INFO BlockPlacementPolicy > Not enough replicas was chosen. Reason:{NODE_TOO_BUSY=1} > 2:38:20.527 PM DEBUG StateChange > closeFile: > /mobi.me/development/apps/flink/checkpoints/a5a6806866c1640660924ea1453cbe34/chk-2118/eef8bff6-75a9-43c1-ae93-4b1a9ca31ad9 > with 1 blocks is persisted to the file system > 2:38:20.527 PM DEBUG StateChange > *BLOCK* NameNode.addBlock: file > /mobi.me/development/apps/flink/checkpoints/a5a6806866c1640660924ea1453cbe34/chk-2118/1cfe900d-6f45-4b55-baaa-73c02ace2660 > fileId=129628869 for DFSClient_NONMAPREDUCE_467616914_65 > 2:38:20.527 PM DEBUG BlockPlacementPolicy > Failed to choose from local rack (location = /default); the second replica is > not found, retry choosing ramdomly > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: > > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:784) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:694) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:561) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:464) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:395) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:270) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:142) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:158) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1715) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3505) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:694) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:219) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:507) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275) > {code} > This part makes no sense at all: > {code:java} > load: 8 > 0.0{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org