Thanks for starting off the discussion. > This is a huge new feature with 86 jiras already filed, which substantially increases the complexity of the code base. These are 86 jiras file in a feature branch. We decided to make these changes, in smaller increments, instead of a jumbo patch. This was done in good faith, as community did not want a jumbo patch (as seen in several discussions), to make reviewing of the patch easy and to record the changes for reference. Main changes have gone in a few jiras. Others are mainly fixing the test failures, adding tests and fixing bugs introduced during development. Please review the patch and provide feed back; we will address the concerns.
> Having an in-depth motivation and benchmarking will be needed before the community decides on adopting it for support. This comes as a surprise, especially from Konstantin :-). The first part of the proposal and design both cover motivation. As regards to benchmarking - if you see the design, there is no big change in i/o subsystem. Most of the changes are in the organization of storage to introduce block pools, block pool ID, a thread per namenode in datanode, upgrade/rollback. Not sure what concerns you have as regards to benchmarking. So far our tests show no difference with federation. As we developed this feature, some significant improvements have been made to the system - fast snapshots (snapshot time down from 1hr 45 mins to 1 min!), fast startup, cleanup of storage, fixing multi threading issues in several places, decommissioning improvements etc. > The purpose of my reply was to get this discussion going, as I found Allens question unanswered for 2 weeks. My email was sent on March 3rd. Allen's email was sent on March 12th. > The concern he has seems legitimate to me. If ops think federation will "make running a grid much much harder" I want to know why and how much harder. I would like to understand the concerns here. Allen please add details. > The way I see it now, Federation introduces > - lots of code complexity to the system > - harder manageability, according to Allen > - potential performance degradation (tbd) I have addressed these already. > And the main question for those 95% of users, who don't run large clusters or don't want to place all their compute resources in one data center, is what is the advantage in supporting it? This is a valid concern. Hence the single namenode configuration that most installations run today, will run as is. We put a lot of development and testing effort to ensure this. Regards, Suresh