[
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055419#comment-14055419
]
Colin Patrick McCabe commented on HDFS-6482:
--------------------------------------------
bq. \[james wrote\]: Also, I don't think it makes sense to support both the
LDir structure and this structure simultaneously
The main reason to do this change is to save memory and simplify things by not
having to store the path to each replica. If we support the old layout, then
we no longer have this nice property. We could still get some of the gains by
setting the path to null in some of the various data structures... basically
assume that null means "this replica is located at a place determined by its
block id." And non-null would mean using the old system. This might be a
possible solution. I would prefer not to go down this road due to the greater
code complexity, though.
bq. \[suresh wrote\]: I think creating hard links with new schema is an issue.
The main reason for hardlinks created as it is done today is to minimize the
impact of any bug in new software. The simplest thing was done where we
iterated over directories and created hardlinks. Rollback must ensure the
system goes back to previous state of the system.
I don't see why a rollback wouldn't work here. It's the same as going from the
old (pre hadoop-2.0) layout to the new block pool-based layout. We also used
hardlinks there to provide downgrade capability, and it also worked there.
We're not changing the contents of the old directory, just moving it out of the
way and hardlinking to the block and meta files within it.
bq. James Thomas, we did a bunch of improvement to cut down the time from 10s
of minutes to a couple of minutes. See HDFS-1445 for more details. Clearly
anything significantly above 60S (design goal of rolling upgrades) will results
in issues for rolling upgrades.
Yes. This is a very important consideration. James and I discussed a few ways
to optimize the hardlink process. I think that it's very possible for this to
be done in a second or two at most. If you assume 500,000 replicas spread over
10 drives, you have 50,000 hardlinks to make on each drive. This just isn't
going to take that long, since the operations you're doing are just altering
memory (we don't fsync after calling {{link}}). It's just a question of doing
it in a smart way that minimizes the number of {{exec}} calls we make (and
possibly obtains some parallelism).
> Use block ID-based block layout on datanodes
> --------------------------------------------
>
> Key: HDFS-6482
> URL: https://issues.apache.org/jira/browse/HDFS-6482
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 2.5.0
> Reporter: James Thomas
> Assignee: James Thomas
> Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch,
> HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch,
> HDFS-6482.7.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many
> subdirectories when capacity is reached. Instead we can use a block's ID to
> determine the path it should go in. This eliminates the need for the LDir
> data structure that facilitates the splitting of directories when they reach
> capacity as well as fields in ReplicaInfo that keep track of a replica's
> location.
> An extension of the work in HDFS-3290.
--
This message was sent by Atlassian JIRA
(v6.2#6252)