Thanks for starting off the discussion.

> This is a huge new feature with 86 jiras already filed, which
substantially increases the complexity of the code base.
These are 86 jiras file in a feature branch. We decided to make these
changes, in smaller increments, instead of a jumbo patch. This was done in
good faith, as community did not want a jumbo patch (as seen in several
discussions), to make reviewing of the patch easy and to record the changes
for reference. Main changes have gone in a few jiras. Others are mainly
fixing the test failures, adding tests and fixing bugs introduced during
development. Please review the patch and provide feed back; we will address
the concerns.

> Having an in-depth motivation and benchmarking will be needed before the
community decides on adopting it for support.
This comes as a surprise, especially from Konstantin :-). The first part of
the proposal and design both cover motivation.

As regards to benchmarking - if you see the design, there is no big change
in i/o subsystem. Most of the changes are in the organization of storage to
introduce block pools, block pool ID, a thread per namenode in datanode,
upgrade/rollback. Not sure what concerns you have as regards to
benchmarking. So far our tests show no difference with federation.

As we developed this feature, some significant improvements have been made
to the system - fast snapshots (snapshot time down from 1hr 45 mins to 1
min!), fast startup, cleanup of storage, fixing multi threading issues in
several places, decommissioning improvements etc.

> The purpose of my reply was to get this discussion going, as I found
Allens question unanswered for 2 weeks.
My email was sent on March 3rd. Allen's email was sent on March 12th.

> The concern he has seems legitimate to me. If ops think federation will
"make running a grid much much harder" I want to know why and how much
harder.
I would like to understand the concerns here. Allen please add details.

> The way I see it now, Federation introduces
> - lots of code complexity to the system
> - harder manageability, according to Allen
> - potential performance degradation (tbd)
I have addressed these already.

> And the main question for those 95% of users, who don't run large clusters
or don't want to place all their compute resources in one data center, is
what is the advantage in supporting it?
This is a valid concern. Hence the single namenode configuration that most
installations run today, will run as is. We put a lot of development and
testing effort to ensure this.

Regards,
Suresh

Reply via email to