[
https://issues.apache.org/jira/browse/HDFS-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346310#comment-14346310
]
Xinwei Qin commented on HDFS-7876:
-----------------------------------
Hi, [~drankye], thanks for your comment.
{quote}
I guess you need to be careful to avoid race conditions like some operations
happening from RPC that rely on but before the loading completion of fsimage.
{quote}
These operations happening from RPC that rely on but before the loading
completion of fsimage will not run because of the checkNNStartup() method
protection.
bq. Is it ok if a DN finishes scanning blocks and starts to report blocks while
NN is still loading image ?
After finishing block scanning, DN will try to register to NN. As same as I
said above, register operation will not be successful until the loading
completion of fsimage because of checkNNStartup method. Reporting blocks
happens after successful registration of DN, and this operation will also
checkNNStartup, so, reporting blocks and loading image will not happen in the
same time.
> DataNodes start to scan blocks earlier
> --------------------------------------
>
> Key: HDFS-7876
> URL: https://issues.apache.org/jira/browse/HDFS-7876
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode, namenode
> Affects Versions: 3.0.0
> Reporter: Xinwei Qin
> Assignee: Xinwei Qin
>
> When Hadoop cluster restarts, DataNodes will scan local blocks, and report
> this infomation to NameNode. DataNodes start to scan local blocks after
> obtaining the NamespaceInfo from NameNode via RPC call versionRequest(),
> which needs the establishment of NameNode RPC server.
> Now, the RPC server will not be created and started until the completion of
> loading FsImage. So, DataNodes cannot start to scan blocks immediately, and
> must wait for NameNode to load FsImage. This will cause time wasting of
> DataNode when the FsImage is very large.
> Since the RPC server has very little dependence of FsImage, and the
> NamespaceInfo (namespaceID, clustered, blockpoolID, cTime, etc.) can be
> constructed from VERSION file, we can create and start RPC server before
> loading FsImage, so that DataNodes can get NamespaceInfo from NameNode via
> RPC call as soon as possible, and start to scan blocks earlier, which will
> shorten restart time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)