[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

Tsz Wo Nicholas Sze (JIRA) Thu, 10 Dec 2015 14:48:51 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051796#comment-15051796
 ]


Tsz Wo Nicholas Sze commented on HDFS-8578:
-------------------------------------------

Thanks [~vinayrpet] for the new patch.

I think we have to keep addStorageLocations synchronized since 
addStorageLocations involves a lot of code.  It is very hard to keep 
synchronization correct if addStorageLocations is not synchronized.  In case 
that there is a synchronization bug, the datanode memory state may be 
inconsistent and result in data loss.  So we should be very careful here.

It would be great if we can process (including load, upgrade, recover) all the 
directories in parallel.  However, it is hard to get everything correct.  Let's 
focus on only upgrade since it is our problem today.

I played around the code.  It seems that it is relatively easy to run upgrade 
in parallel since the hardlink related code are mostly static.  I will base on 
Vinay's patch to work on a patch.

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, 
> HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, 
> HDFS-8578-15.patch, HDFS-8578-branch-2.6.0.patch, 
> HDFS-8578-branch-2.7-001.patch, HDFS-8578-branch-2.7-002.patch, 
> HDFS-8578-branch-2.7-003.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

Reply via email to