[ 
https://issues.apache.org/jira/browse/HDFS-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358020#comment-14358020
 ] 

Vinayakumar B commented on HDFS-7876:
-------------------------------------

Hi [~xinwei],
Idea looks fine. Need to check whether any other impacts because of this. 
{{checkNNStartup()}} was added to just reduce the race b/n rpcserver start and 
the commonservices start.

This improvement is necessary only when the entire big cluster is restarted.

After this, most of clients request will get IOException() because of 
{{checkNNStartup()}} for which retries may not happen. Because of this client's 
operations may fail.

IMO, when restarting the entire big cluster, its okay to wait for this little 
more time. than creating the problems at the client side.

> DataNodes start to scan blocks earlier
> --------------------------------------
>
>                 Key: HDFS-7876
>                 URL: https://issues.apache.org/jira/browse/HDFS-7876
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: Xinwei Qin 
>            Assignee: Xinwei Qin 
>         Attachments: HDFS-7876.001.patch
>
>
> When Hadoop cluster restarts, DataNodes will scan local blocks, and report 
> this infomation to NameNode. DataNodes start to scan local blocks after 
> obtaining the NamespaceInfo from NameNode via RPC call versionRequest(), 
> which needs the establishment of NameNode RPC server. 
> Now, the RPC server will not be created and started until the completion of 
> loading FsImage. So, DataNodes cannot start to scan blocks immediately, and 
> must wait for NameNode to load FsImage. This will cause time wasting of 
> DataNode when the FsImage is very large. 
> Since the RPC server has very little dependence of FsImage, and the 
> NamespaceInfo (namespaceID, clustered, blockpoolID, cTime, etc.) can be 
> constructed from VERSION file, we can create and start RPC server before 
> loading FsImage, so that DataNodes can get NamespaceInfo from NameNode via 
> RPC call as soon as possible, and start to scan blocks earlier, which will 
> shorten restart time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to