Re: [Discuss] Merge federation branch HDFS-1052 into trunk

suresh srinivas Tue, 26 Apr 2011 16:06:57 -0700

Doug, please reply back. I am planning to commit this by tonight, as I would
like to avoid unnecessary merge work and also avoid having to redo the merge
if SVN is re-organized.


On Tue, Apr 26, 2011 at 10:29 AM, suresh srinivas <[email protected]>wrote:

> Doug,
>
>
>> 1. Can you please describe the significant advantages this approach has
>> over a symlink-based approach?
>
> Federation is complementary with symlink approach. You could choose to
> provide integrated namespace using symlinks. However, client side mount
> tables seems a better approach for many reasons:
> # Unlike symbolic links, client side mount tables can choose to go to right
> namenode based on configuration. This avoids unnecessary RPCs to the
> namenodes to discover the targer of symlink.
> # The unavailability of a namenode where a symbolic link is configured does
> not affect reaching the symlink target.
> # Symbolic links need not be configured on every namenode in the cluster
> and future changes to symlinks need not be propagated to multiple namenodes.
> In client side mount tables, this information is in a central configuration.
>
> If a deployment still wants to use symbolic link, federation does not
> preclude it.
>
>
> > It seems to me that one could run multiple namenodes on separate boxes
> and run multile datanode processes per storage box
>
> There are several advantages to using a single datanode:
> # When you have large number of namenodes (say 20), the cost of running
> separate datanodes in terms of process resources such as memory is huge.
> # The disk i/o management and storage utilization using a single datanode
> is much better, as it has complete view the storage.
> # In the approach you are proposing, you have several clusters to manage.
> However with federation, all datanodes are in a single cluster; with single
> configuration and operationally easier to manage.
>
> > The patch modifies much of the logic of Hadoop's central component, upon
> which the performance and reliability of most other components of the
> ecosystem depend.
> That is not true.
>
> # Namenode is mostly unchanged in this feature.
> # Read/write pipelines are unchanged.
> # The changes are mainly in datanode:
> #* the storage, FSDataset, Directory and Disk scanners now have another
> level to incorporate block pool ID into the hierarchy. This is not a
> significant change that should cause performance or stability concerns.
> #* datanodes use a separate thread per NN, just like the existing thread
> that communicates with NN.
>
> > Can you please tell me how this has been tested beyond unit tests?
> As regards to testing, we have passed 600+ tests. In hadoop, these  tests
> are mostly integration tests and not pure unit tests.
>
> While these tests have been extensive, we have also been testing this
> branch for last 4 months, with QA validation that reflects our production
> environment. We have found the system to be stable, performing well and have
> not found any blockers with the branch so far.
>
> HDFS-1052 has been open more than a year now. I had also sent an email
> about this merge around 2 months ago. There are 90 subtasks that have been
> worked on last couple of months under HDFS-1052. Given that there was enough
> time to ask these questions, your email a day before I am planning to merge
> the branch into trunk seems late!
>
> --
> Regards,
> Suresh
>
>


-- 
Regards,
Suresh

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

Reply via email to