Thanks for all the feedback, folks. I have created a jira: https://issues.apache.org/jira/browse/HDFS-5324.
Let us continue detailed discussions there. - Milind --- Milind Bhandarkar Chief Scientist Pivotal +1-650-523-3858 (W) +1-408-666-8483 (C) On Mon, Oct 7, 2013 at 9:50 PM, sanjay Radia <san...@hortonworks.com> wrote: > > On Oct 3, 2013, at 12:17 PM, Milind Bhandarkar wrote: > > > Exec Summary: For the last couple of months, we, at Pivotal, along with a > > couple of folks in the community have been working on making Namespace > > implementation in the namenode pluggable. We have demonstrated that it > can > > be done without major surgery on the namenode, and does not have > noticeable > > performance impact. We would like to contribute it back to Apache if > there > > is sufficient interest. Please let us know if you are interested, and we > > will create a Jira and update the patch for in-progress work. > > …… > > > Milind, > a reasonable idea - but best to discuss actual details in a jira. Some > initial thoughts, to clear some of the confusions, (and accusations) in > this thread > > HDFS pluggability (and relation to pluggability added as part of > Federation) > - Pluggabilty and federation are orthogonal, although we did improved the > pluggabily of HDFS as part of federation implementation. As Vinod has noted > the *block layer* was separated out as part of the federation work and > hence makes the general development of new of HDFS namespace > implementations easier. Federation's pluggablity was targeted towards > someone writing a new NN and reusing the block storage layer via a library > and optionally living side-by-side with different implementations of the > NN within the same cluster. Hence we added notion of block pools and > separated out the block management layer. > - So your proposed work is clearly not in conflict with Federation or > even with the pluggability that Federation added, but philosophically, > your proposal is complementary. > > Considerations: A Public API? > The FileSystem/AbstractFileSystem APIs and the newly proposed > AbstractFSNamesystem are targeting very different kinds of plugability into > Hadoop. The former takes a thin application API (FileSystem and > FileContext) and makes it easy for users to plug in different filesytems > (S3, LocalFS, etc) as Hadoop compatible filesystems. In contrast the later > (the proposed AbstractFSNamesystem) is a fatter interface inside the depths > of HDFS implementation and makes parts of the impl pluggable. > > I would not make your proposed AbstractFSNamesystem a public stable > Hadoop API but instead direct it towards to HDFS developers who want to > extend the implementation of HDFS more easily. Were you envisioning the > Abstract FSNamesystem to be a stable public Hadoop API? If someone has > their own private implementation for this new abstract class, would the > HDFS community have the freedom to modify the abstract class in > incompatible ways? These are discussions for the Jira. > > A somewhat related piece of work: > Since Milind motivated his pluggbility by a new NN implementation (that > happens to use HBase), I will briefly mention an experiment for building a > new NN that stores only a partial namespace in memory. The goal of this > experiment was *not* making the NN code more pluggable, but instead to > provide an alternate implementation of the NN; hence it is orthogonal. A > PhD student, who worked as an intern at Hortonworks implemented a NN that > stores only partial namespace in RAM. She presented this to a HUG in Aug > 2013 in sunnyvale. I have encouraged her to file a jira but she wants to > finish some more experiments before filing, I will file a jira on her > behalf and refer to her work in the next day or so. It is a prototype that > helps us understand how well the particular implementation choice for this > alternate NN works. It would be interesting to see if her code changes fit > into Milind's newly proposed AbstractFSNamesystem. My initial view is that > it may not, but I will wait till Milind posts an initial strawman of the > AbstractFSNamesystem before commenting (While subclassing interfaces can > works very well, subclassing implementations can be very tricky to get > right.). > > Milind, please file the jira for further discussions. > > sanjay > > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >