[
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051796#comment-15051796
]
Tsz Wo Nicholas Sze commented on HDFS-8578:
-------------------------------------------
Thanks [~vinayrpet] for the new patch.
I think we have to keep addStorageLocations synchronized since
addStorageLocations involves a lot of code. It is very hard to keep
synchronization correct if addStorageLocations is not synchronized. In case
that there is a synchronization bug, the datanode memory state may be
inconsistent and result in data loss. So we should be very careful here.
It would be great if we can process (including load, upgrade, recover) all the
directories in parallel. However, it is hard to get everything correct. Let's
focus on only upgrade since it is our problem today.
I played around the code. It seems that it is relatively easy to run upgrade
in parallel since the hardlink related code are mostly static. I will base on
Vinay's patch to work on a patch.
> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Raju Bairishetti
> Assignee: Vinayakumar B
> Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch,
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch,
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch,
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch,
> HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch,
> HDFS-8578-15.patch, HDFS-8578-branch-2.6.0.patch,
> HDFS-8578-branch-2.7-001.patch, HDFS-8578-branch-2.7-002.patch,
> HDFS-8578-branch-2.7-003.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs
> sequentially. Assume it takes ~20 mins to process a single storage dir then
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
> for (int idx = 0; idx < getNumStorageDirs(); idx++) {
> doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
> assert getCTime() == nsInfo.getCTime()
> : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)