Xinwei Qin created HDFS-7876:
---------------------------------
Summary: DataNodes start to scan blocks earlier
Key: HDFS-7876
URL: https://issues.apache.org/jira/browse/HDFS-7876
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Xinwei Qin
Assignee: Xinwei Qin
When Hadoop cluster restarts, DataNodes will scan local blocks, and report this
infomation to NameNode. DataNodes start to scan local blocks after obtaining
the NamespaceInfo from NameNode via RPC call versionRequest(), which needs the
establishment of NameNode RPC server.
Now, the RPC server will not be created and started until the completion of
loading FsImage. So, DataNodes cannot start to scan blocks immediately, and
must wait for NameNode to load FsImage. This will cause time wasting of
DataNode when the FsImage is very large.
Since the RPC server has very little dependence of FsImage, and the
NamespaceInfo (namespaceID, clustered, blockpoolID, cTime, etc.) can be
constructed from VERSION file, we can create and start RPC server before
loading FsImage, so that DataNodes can get NamespaceInfo from NameNode via RPC
call as soon as possible, and start to scan blocks earlier, which will shorten
restart time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)