On Jul 15, 2007, at 11:08 PM, Dhruba Borthakur wrote:
I guess what you are saying is that a block can belong to multiple
files.
A better name for the feature would be "clone," I think. And yes, it
would be a file copy that is cheap since it doesn't involve moving
any data. It only updates structures on the NameNode.
1. File deletion: In the current code, when a file is deleted, all
blocks
belonging to that file are scheduled for deletion. This code has to
change
in such a way that a block gets deleted only if it does not belong
to *any*
file.
There would either need to be a ref count on the blocks or a reverse
mapping of blocks to sets of files. And yes, you can only delete the
block if the set of files is empty or the ref count goes to 0. A more
invasive change is that the desired replication of the block is the
maximum of the replications of the containing files. I assume that
means that you would need to stored desired replication on each block
rather than in the file information.
2. race between cow() and delete(): The client invokes cow() with
set of
LocatedBlocks. Since there aren't any client side locks, by the
time the
Namenode processes the cow() command, the original block(s) could
have been
deleted.
The right interface in my opinion is not that you give blocks at all,
but do the clone at the file level.
void cloneFile(Path source, Path destination) throws IOException
or something. Then the namespace can be locked while the data
structures are read and modified.
-- Owen