[ https://issues.apache.org/jira/browse/HDFS-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698672#comment-14698672 ]
Felix Borchers commented on HDFS-8093: -------------------------------------- grep /system/balancer.id hadoop-hdfs-namenode-devhmn02.rz.is.log.1 {code} ... 2015-08-14 00:30:03,843 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /system/balancer.id. BP-322804774-10.13.54.1-1412684451669 blk_1074256920_516292{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-4db312aa-bc23-47dc-b768-52a2d72b09d3:NORMAL:10.13.53.30:50010|RBW], ReplicaUnderConstruction[[DISK]DS-c7db1b58-8e25-435f-8af8-08b6754c021c:NORMAL:10.13.53.16:50010|RBW], ReplicaUnderConstruction[[DISK]DS-4457ae11-7684-4187-b4ad-56466d79fba2:NORMAL:10.13.53.19:50010|RBW]]} 2015-08-14 00:30:03,958 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /system/balancer.id for DFSClient_NONMAPREDUCE_-1841368225_1 2015-08-14 00:30:03,986 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /system/balancer.id. BP-322804774-10.13.54.1-1412684451669 blk_1074256921_516293{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-8f3d8860-b977-4b7b-b681-d25c112ad1f3:NORMAL:10.13.53.14:50010|RBW], ReplicaUnderConstruction[[DISK]DS-abb5362f-6d29-478f-a678-53f09c096871:NORMAL:10.13.53.12:50010|RBW], ReplicaUnderConstruction[[DISK]DS-b02f3ebc-955e-4e11-82df-dc51278dc06f:NORMAL:10.13.53.17:50010|RBW]]} 2015-08-14 00:30:04,002 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /system/balancer.id for DFSClient_NONMAPREDUCE_-1841368225_1 2015-08-14 00:46:44,975 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /system/balancer.id (inode 709043): File does not exist. Holder DFSClient_NONMAPREDUCE_-1841368225_1 does not have any open files. 2015-08-14 00:46:44,975 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from 10.13.52.1:58633 Call#220 Retry#0: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /system/balancer.id (inode 709043): File does not exist. Holder DFSClient_NONMAPREDUCE_-1841368225_1 does not have any open files. ... {code} The LogMessages are between the two timestamps are: {code} 2015-08-14 00:30:03,843 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /system/balancer.id. BP-322804774-10.13.54.1-1412684451669 blk_1074256920_516292{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-4db312aa-bc23-47dc-b768-52a2d72b09d3:NORMAL:10.13.53.30:50010|RBW], ReplicaUnderConstruction[[DISK]DS-c7db1b58-8e25-435f-8af8-08b6754c021c:NORMAL:10.13.53.16:50010|RBW], ReplicaUnderConstruction[[DISK]DS-4457ae11-7684-4187-b4ad-56466d79fba2:NORMAL:10.13.53.19:50010|RBW]]} 2015-08-14 00:30:03,958 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /system/balancer.id for DFSClient_NONMAPREDUCE_-1841368225_1 2015-08-14 00:30:03,986 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /system/balancer.id. BP-322804774-10.13.54.1-1412684451669 blk_1074256921_516293{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-8f3d8860-b977-4b7b-b681-d25c112ad1f3:NORMAL:10.13.53.14:50010|RBW], ReplicaUnderConstruction[[DISK]DS-abb5362f-6d29-478f-a678-53f09c096871:NORMAL:10.13.53.12:50010|RBW], ReplicaUnderConstruction[[DISK]DS-b02f3ebc-955e-4e11-82df-dc51278dc06f:NORMAL:10.13.53.17:50010|RBW]]} 2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* addBlock: block blk_1074256920_516292 on node 10.13.53.16:50010 size 134217728 does not belong to any file 2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1074256920_516292 to 10.13.53.16:50010 2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* BlockManager: ask 10.13.53.16:50010 to delete [blk_1074256920_516292] 2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* BlockManager: ask 10.13.53.14:50010 to delete [blk_1074256910_516282] 2015-08-14 00:30:04,002 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /system/balancer.id for DFSClient_NONMAPREDUCE_-1841368225_1 2015-08-14 00:30:04,213 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.13.53.14:50010 is added to blk_1074256517_515889 size 9460 2015-08-14 00:30:04,214 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1074256517_515889 to 10.13.53.30:50010 2015-08-14 00:30:04,214 INFO BlockStateChange: BLOCK* chooseExcessReplicates: ([DISK]DS-4db312aa-bc23-47dc-b768-52a2d72b09d3:NORMAL:10.13.53.30:50010, blk_1074256517_515889) is added to invalidated blocks set {code} > BP does not exist or is not under Constructionnull > -------------------------------------------------- > > Key: HDFS-8093 > URL: https://issues.apache.org/jira/browse/HDFS-8093 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Affects Versions: 2.6.0 > Environment: Centos 6.5 > Reporter: LINTE > > HDFS balancer run during several hours blancing blocs beetween datanode, it > ended by failing with the following error. > getStoredBlock function return a null BlockInfo. > java.io.IOException: Bad response ERROR for block > BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 from > datanode 192.168.0.18:1004 > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:897) > 15/04/08 05:52:51 WARN hdfs.DFSClient: Error Recovery for block > BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 in pipeline > 192.168.0.63:1004, 192.168.0.1:1004, 192.168.0.18:1004: bad datanode > 192.168.0.18:1004 > 15/04/08 05:52:51 WARN hdfs.DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.io.IOException): > BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not > exist or is not under Constructionnull > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:717) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:931) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at org.apache.hadoop.ipc.Client.call(Client.java:1468) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy11.updateBlockForPipeline(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:877) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy12.updateBlockForPipeline(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1266) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548) > 15/04/08 05:52:51 ERROR hdfs.DFSClient: Failed to close inode 19801755 > org.apache.hadoop.ipc.RemoteException(java.io.IOException): > BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not > exist or is not under Constructionnull > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:717) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:931) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at org.apache.hadoop.ipc.Client.call(Client.java:1468) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy11.updateBlockForPipeline(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:877) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy12.updateBlockForPipeline(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1266) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548) -- This message was sent by Atlassian JIRA (v6.3.4#6332)