On Mon, Mar 14, 2011 at 6:12 PM, Konstantin Shvachko <shv.had...@gmail.com> wrote: > Dhruba, good you are speaking up for federation. > I consider it important as it means more support for the feature in the > future. > > The purpose of my reply was to get this discussion going, as I found Allens > question unanswered for 2 weeks. > The concern he has seems legitimate to me. If ops think federation will > "make running a grid much much harder" I want to know why and how much > harder. > Because cluster "manageability" is claimed as one of the objectives of > federation. > > I sure am well familiar with the design being a part of it for a while. > And all my concerns have been articulated and well known. Though not all of > them are addressed. > > The way I see it now, Federation introduces > - lots of code complexity to the system > - harder manageability, according to Allen > - potential performance degradation (tbd) > And the main question for those 95% of users, who don't run large clusters > or > don't want to place all their compute resources in one data center, is what > is the advantage in supporting it? > > Performance-wise there 2 main aspects: > - Does federation give me the same cluster performance if I don't federate? > - If I federate how much more throughput can I get? >
This reminds me of multi-cell GFS (discussed by Quinlan & McKusick at http://bit.ly/einKMn). I used to run some of those clusters, and compared to standard single-master clusters of course they were more complex to manage. However, if you have apps needing that much master capacity & that much shared read+write bandwidth across large pools of storage nodes, its worth the trouble. Assuming most people don't use federation it shouldn't add complexity in the common case, but opens up some needed capabilities for large sites. Stuff like datanode management would become more challenging in a multi-master environment, but that's where automation comes in. If you don't have teams building tools to manage your datacenter, its likely you don't need federation either. I'm currently running a handful of HDFS clusters & my overall reaction to federation is "that's cool, but I probably won't need it for a few years." Seems like the sort of thing the vast majority of sites won't even encounter - you'd just add datanodes to one master & start using it. --travis > Thanks, > --Konstantin > > On Mon, Mar 14, 2011 at 10:43 AM, Dhruba Borthakur <dhr...@gmail.com> wrote: > >> Hi folks, >> >> The design for the federation work has been a published and there is a very >> well-written design document. It explains the pros-and-cons of each design >> point. It would be nice if more people can review this document and provide >> comments on how to make it better. The implementation is in progress but >> that does not mean that the >> "design-is-cast-in-stone-and-cannot-be-enhanced". >> >> Allen: can you pl describe what you mean by "It sounds like merging into >> trunk is extremely premature". If we can make all unit tests pass >> successfully on the branch, then do you think we should merge that branch >> into the trunk? >> >> Konstantin: I agree that federation introduces new code complexity. But it >> is a fact that introducing a new heavy-weight feature will add complexity. >> If you have a different proposal (and implementation) to scale namenode, >> please share it with us and we can then evaluate these designs in terms on >> complexity/feature. If you have questions about certain issues in the >> design, it would be great if you can ask them now. Hopefully, the folks >> doing the implementation can then provide you performance numbers to >> alleviate your concerns. >> >> From that way I look at it, I think the federation-feature is a huge >> positive step in the right direction. >> >> thanks, >> dhruba >> >> >> >> >> On Mon, Mar 14, 2011 at 10:28 AM, Konstantin Shvachko < >> shv.had...@gmail.com> wrote: >> >>> Allen is right. >>> This is a huge new feature with 86 jiras already filed, which >>> substantially >>> increases the complexity of the code base. >>> Having an in-depth motivation and benchmarking will be needed before the >>> community decides on adopting it for support. >>> Thanks, >>> --Konstantin >>> >>> >>> >>> On Sat, Mar 12, 2011 at 8:43 AM, Allen Wittenauer >>> <awittena...@linkedin.com>wrote: >>> >>> > >>> > On Mar 3, 2011, at 2:41 PM, Suresh Srinivas wrote: >>> > >>> > > We have started pushing changes for namenode federation in to the >>> feature >>> > branch HDFS-1052. The work items are created as subtask of the jira >>> > HDFS-1052 and are based on the design document published in the same >>> jira. >>> > By the end of this week, we will complete pushing the changes to >>> HDFS-1052 >>> > branch. Though the changes in these jiras are already committed, please >>> do >>> > provide your feedback on either HDFS-1052 or its subtasks. New items >>> that >>> > come out of the feedback will be addressed in new jiras. >>> > >>> > > >>> > > Current status of the development: >>> > > # The testing of this feature is underway. Most of the basic >>> > functionality has been tested both for a single namenode cluster (for >>> > backward compatibility) and with multiple namenodes. >>> > > # All the existing tests and newly added tests pass (same as trunk). >>> > > >>> > > We plan on merging this branch to trunk after a week or two. This will >>> > help us continue make future changes on the trunk. I will send an >>> > announcement before merging the federation branch into trunk. >>> > > >>> > >>> > It sounds like merging into trunk is extremely premature. That >>> > said, I'm still trying to understand the why's around this. >>> > >>> > To me, this series of changes looks like it is going to make >>> running >>> > a grid much much harder for very little benefit. In particular, I don't >>> see >>> > the difference between running multiple NN/DN combinations verses >>> running >>> > federation, especially with client side mount tables in play. >>> > >>> > >>> >> >> >> >> -- >> Connect to me at http://www.facebook.com/dhruba >> >