ChenFolin created HDFS-10214:
--------------------------------
Summary: Checkpoint Can not be done by StandbyNameNode.Because
checkpoint may cause DataNode blockReport.blockReceivedAndDeleted.heartbeat rpc
timeout when the object num > 100000000.
Key: HDFS-10214
URL: https://issues.apache.org/jira/browse/HDFS-10214
Project: Hadoop HDFS
Issue Type: New Feature
Components: ha, namenode
Affects Versions: 2.6.4, 2.5.0
Environment: 500 DataNode.
137407265 files and directories, 129614074 blocks = 267021339 total filesystem
object(s)
Reporter: ChenFolin
The current Cluster status :
137407265 files and directories, 129614074 blocks = 267021339 total filesystem
object(s).
The checkpoint save namespace cost more than 5 min.
DataNode rpc timeout.
Standby NameNode skip the DataNode rpc request(because datanode rpc timeout ,
datanode close the socket channel).
There are many corrupt files when failover.
So, Checkpoint may be done by other component, not Standby NameNode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)