Re: Copy on write for HDFS

Owen O'Malley Mon, 16 Jul 2007 08:01:24 -0700


On Jul 15, 2007, at 11:08 PM, Dhruba Borthakur wrote:

I guess what you are saying is that a block can belong to multiplefiles.

A better name for the feature would be "clone," I think. And yes, itwould be a file copy that is cheap since it doesn't involve movingany data. It only updates structures on the NameNode.

1. File deletion: In the current code, when a file is deleted, allblocksbelonging to that file are scheduled for deletion. This code has tochangein such a way that a block gets deleted only if it does not belongto *any*
file.

There would either need to be a ref count on the blocks or a reversemapping of blocks to sets of files. And yes, you can only delete theblock if the set of files is empty or the ref count goes to 0. A moreinvasive change is that the desired replication of the block is themaximum of the replications of the containing files. I assume thatmeans that you would need to stored desired replication on each blockrather than in the file information.

2. race between cow() and delete(): The client invokes cow() withset ofLocatedBlocks. Since there aren't any client side locks, by thetime theNamenode processes the cow() command, the original block(s) couldhave been
deleted.

The right interface in my opinion is not that you give blocks at all,but do the clone at the file level.


void cloneFile(Path source, Path destination) throws IOException

or something. Then the namespace can be locked while the datastructures are read and modified.


-- Owen

Re: Copy on write for HDFS

Reply via email to