[
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
TanYuxin updated HDFS-12749:
----------------------------
Description:
Now our cluster have 7000+ DN, files num 180+ million, block num 180+ million.
After SNN restart,DN will call BPServiceActor#reRegister method to register.
But register RPC will get a IOException since NN is busy for deal with Block
Report. The exception is caught at BPServiceActor#processCommand.
Next is the caught IOException:
{code:java}
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode
Command
java.io.IOException: Failed on local exception: java.io.IOException:
java.net.SocketTimeoutException: 60000 millis timeout while w
aiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.14.110.33:24562
remote=namenode.host.03/10.14.27.17:8040]; Host Details : local host is:
"datanode-2220/10.14.110.33"; destination host is: "namenode.host.03":8040;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
at org.apache.hadoop.ipc.Client.call(Client.java:1474)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
at java.lang.Thread.run(Thread.java:745)
{code}
was:After SNN restart,DN will call BPServiceActor#reRegister method to
register. But when DN registers to NN, SNN will return
> DN may not send block report to NN after NN restart
> ---------------------------------------------------
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: TanYuxin
>
> Now our cluster have 7000+ DN, files num 180+ million, block num 180+ million.
> After SNN restart,DN will call BPServiceActor#reRegister method to register.
> But register RPC will get a IOException since NN is busy for deal with Block
> Report. The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException:
> java.net.SocketTimeoutException: 60000 millis timeout while w
> aiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.14.110.33:24562
> remote=namenode.host.03/10.14.27.17:8040]; Host Details : local host is:
> "datanode-2220/10.14.110.33"; destination host is: "namenode.host.03":8040;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]