On Mar 14, 2011, at 10:57 AM, Sanjay Radia wrote:
On Mar 12, 2011, at 8:43 AM, Allen Wittenauer wrote:
To me, this series of changes looks like it is going to make
running a grid much much harder for very little benefit. In
particular, I don't see the difference between running multiple NN/
DN combinations verses running federation, especially with client
side mount tables in play.
Main difference between independent HDFS clusters and HDFS federation
is that in federation one can shares the storage of the DNs and the
DNs.
There is a very detailed document that describes this on the Jira.
If you are running a single NN and you don't need the scaling then
running and managing hadoop is for all practical purposes unchanged.
sanjay
Allen, not sure if I explained the difference above.
Base on the discussion we had at the Hug, I want to clarify a few things
In federation the NNs and the DNs are part of a cluster. It is not as
if a data node is willing to store blocks for any NN anywhere in the
data center.
We still expect a data center to have multiple hadoop clusters each
with a set of data nodes and each cluster with 1 or more NNs.
A DN stores block for only ONE cluster.
You had asked about how one debugs a corrupt file or corrupt block.
In the old world a file's inode contains the block ids of its blocks.
There is also a mapping from block id to block location (ie which DN).
In the federated hdfs, each block is identified by a longer block id,
called the extended block id= blockPool Id + block id.
A block pool is owned by only ONE NN.
Hence if you are trying to locate a block then you map the extended
block id to the block location (ie DN) - this is the same as before,
except that the identifier
of the block is merely longer.
If you are trying to debug from the point of view of the DN:
In federated HDFS, the blocks stored in the DN are segregated in
directories by the blockPool Id.
The block pool id can be mapped to a NN since each Block pool has
only ONE owner.
Hence to map from a block to a particular NN is easy - the first part
of the Block's longer identifier will tell you which NN owns that
block.
sanjay