[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

Vinayakumar B (JIRA) Thu, 05 Nov 2015 17:39:46 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992898#comment-14992898
 ]


Vinayakumar B commented on HDFS-8578:
-------------------------------------

bq. I skimmed patch v10, seems you do not modify the format method. So how do 
you deal with the concurrent modification to layoutVersion and other 
properties? And layoutVersion could be changed in other place. And also, seems 
other properties are always assigned with same values, so could we move this to 
another place that only execute once? The code is a little confusing right 
now...
There will be only one DataStorage instance per DN. And Values assigned in 
{{DataStorage.format()}} will be same for all directories, because these are 
not read from disk.  {{layoutVersion}} is from the code, and other values from 
NamespaceInfo from namenode. Out of DataStorage's properties in VERSION file 
below, except {{storageID}}(which is not a field of DataStorage), others will 
be same for all directories after format/upgrade. Below are the properties from 
Datastorage's VERSION file.
{noformat}
storageID=DS-8a5170b7-a105-45cd-b9b2-9d01c160e11f
clusterID=testClusterID
cTime=0
datanodeUuid=35e3d456-c507-47c8-aaf9-54e77ce49ce0
storageType=DATA_NODE
layoutVersion=-56
{noformat}

So I dont think handling synchronization for properties of DataStorage is 
required.
Only thing is, have to read datanodeUuid, which will be used later, before 
going in parallel.

Hope I have cleared the confusion?

-Thanks

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

Reply via email to