[
https://issues.apache.org/jira/browse/HDFS-15162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273266#comment-17273266
]
JiangHua Zhu commented on HDFS-15162:
-------------------------------------
[~ayushtkn] , I noticed your opinion.
I agree with what you said. When the DN connects to the NN abnormally, it means
that the NN is under pressure or the midway connection fails.
Recently I encountered a problem. When DN connected to NN, after frequent
retries many times (for example, 50 times), an exception broke out. The log is
as follows:
2021-01-01 17:55:21,099 [15993307503]-INFO [clusterxxxx lifeline to
xxxx/xxxx:port:Client$Connection@948]-Retrying connect to server:
xxxx/xxxx:port. Already tried 49 time(s) ; retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2021-01-12 17:55:21,100 [15993307504]-WARN [clusterxxxx lifeline to
xxxx/xxxx:port:BPServiceActor$LifelineSender@1008]-IOException in
LifelineSender for Block pool xxxx (Datanode Uuid xxxx) service to xxxx/xxxx:
port
java.net.ConnectException: Call From xxxx/xxxx to xxxx:port failed on
connection exception: java.net.ConnectException: Connection refused; For more
details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor68.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:754)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1453)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy21.sendLifeline(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.DatanodeLifelineProtocolClientSideTranslatorPB.sendLifeline(DatanodeLifelineProtocolClientSideTranslatorPB.java:100)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$LifelineSender.sendLifeline(BPServiceActor.java:1074)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$LifelineSender.sendLifelineIfDue(BPServiceActor.java:1058)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$LifelineSender.run(BPServiceActor.java:1003)
FBR should not be triggered at this time.
> Optimize frequency of regular block reports
> -------------------------------------------
>
> Key: HDFS-15162
> URL: https://issues.apache.org/jira/browse/HDFS-15162
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Ayush Saxena
> Assignee: Ayush Saxena
> Priority: Critical
>
> Avoid sending block report at regular interval, if there is no failover,
> DiskError or any exception encountered in connecting to the Namenode.
> This JIRA intents to limit the regular block reports to be sent only in case
> of the above scenarios and during re-registration of datanode, to eliminate
> the overhead of processing BlockReports at Namenode in case of huge clusters.
> *Eg.* If a block report was sent at 0000 hours and the next was scheduled at
> 0600 hours if there is no above mentioned scenario, it will skip sending the
> BR, and schedule it to next 1200 hrs. if something of such sort happens
> between 06:- 12: it would send the BR normally.
> *NOTE*: This would be optional and can be turned off by default. Would add a
> configuration to enable this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]