[ 
https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055419#comment-14055419
 ] 

Colin Patrick McCabe commented on HDFS-6482:
--------------------------------------------

bq. \[james wrote\]: Also, I don't think it makes sense to support both the 
LDir structure and this structure simultaneously

The main reason to do this change is to save memory and simplify things by not 
having to store the path to each replica.  If we support the old layout, then 
we no longer have this nice property.  We could still get some of the gains by 
setting the path to null in some of the various data structures... basically 
assume that null means "this replica is located at a place determined by its 
block id."  And non-null would mean using the old system.  This might be a 
possible solution.  I would prefer not to go down this road due to the greater 
code complexity, though.

bq. \[suresh wrote\]: I think creating hard links with new schema is an issue. 
The main reason for hardlinks created as it is done today is to minimize the 
impact of any bug in new software. The simplest thing was done where we 
iterated over directories and created hardlinks. Rollback must ensure the 
system goes back to previous state of the system.

I don't see why a rollback wouldn't work here.  It's the same as going from the 
old (pre hadoop-2.0) layout to the new block pool-based layout.  We also used 
hardlinks there to provide downgrade capability, and it also worked there.  
We're not changing the contents of the old directory, just moving it out of the 
way and hardlinking to the block and meta files within it.

bq. James Thomas, we did a bunch of improvement to cut down the time from 10s 
of minutes to a couple of minutes. See HDFS-1445 for more details. Clearly 
anything significantly above 60S (design goal of rolling upgrades) will results 
in issues for rolling upgrades.

Yes.  This is a very important consideration.  James and I discussed a few ways 
to optimize the hardlink process.  I think that it's very possible for this to 
be done in a second or two at most.  If you assume 500,000 replicas spread over 
10 drives, you have 50,000 hardlinks to make on each drive.  This just isn't 
going to take that long, since the operations you're doing are just altering 
memory (we don't fsync after calling {{link}}).  It's just a question of doing 
it in a smart way that minimizes the number of {{exec}} calls we make (and 
possibly obtains some parallelism).

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, 
> HDFS-6482.3.patch, HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, 
> HDFS-6482.7.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many 
> subdirectories when capacity is reached. Instead we can use a block's ID to 
> determine the path it should go in. This eliminates the need for the LDir 
> data structure that facilitates the splitting of directories when they reach 
> capacity as well as fields in ReplicaInfo that keep track of a replica's 
> location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to