On Jul 15, 2007, at 11:08 PM, Dhruba Borthakur wrote:

I guess what you are saying is that a block can belong to multiple files.

A better name for the feature would be "clone," I think. And yes, it would be a file copy that is cheap since it doesn't involve moving any data. It only updates structures on the NameNode.

1. File deletion: In the current code, when a file is deleted, all blocks belonging to that file are scheduled for deletion. This code has to change in such a way that a block gets deleted only if it does not belong to *any*
file.

There would either need to be a ref count on the blocks or a reverse mapping of blocks to sets of files. And yes, you can only delete the block if the set of files is empty or the ref count goes to 0. A more invasive change is that the desired replication of the block is the maximum of the replications of the containing files. I assume that means that you would need to stored desired replication on each block rather than in the file information.

2. race between cow() and delete(): The client invokes cow() with set of LocatedBlocks. Since there aren't any client side locks, by the time the Namenode processes the cow() command, the original block(s) could have been
deleted.

The right interface in my opinion is not that you give blocks at all, but do the clone at the file level.

void cloneFile(Path source, Path destination) throws IOException

or something. Then the namespace can be locked while the data structures are read and modified.

-- Owen

Reply via email to