On 04/22/2011 09:48 AM, Suresh Srinivas wrote:
> A few weeks ago, I had sent an email about the progress of HDFS
> federation development in HDFS-1052 branch. I am happy to announce
> that all the tasks related to this feature development is complete
> and it is ready to be integrated into trunk.

A couple of questions:

1. Can you please describe the significant advantages this approach has
over a symlink-based approach?

It seems to me that one could run multiple namenodes on separate boxes
and run multile datanode processes per storage box configured with
something like:

first datanode process configuraton
  fs.default.name = hdfs://nn1/
  dfs.data.dir = /drive1/nn1/,drive2/nn1/...

second datanode process configuraton
  fs.default.name = hdfs://nn2/
  dfs.data.dir = /drive1/nn2/,drive2/nn2/...

...

Then symlinks could be used between nn1, nn2, etc to provide a
reasonably unified namespace.  From the benefits listed in the design
document it is not clear to me what the clear, substantial benefits are
over such a configuration.

2. How much testing has been performed on this?  The patch modifies much
of the logic of Hadoop's central component, upon which the performance
and reliability of most other components of the ecosystem depend.  It
seems to me that such an invasive change should be well tested before it
is merged to trunk.  Can you please tell me how this has been tested
beyond unit tests?

Thanks!

Doug

Reply via email to