Another dimension to this discussion that I'd like to address is the
provision for a 1.10 version.  In fact, I lean towards having 1.10
nominated as the pre-2.x LTS version instead of a 1.9.x.  I am in favor of
the basic LTS proposal, but I think that additional accommodations to ease
the pre-2.x to a 2.x upgrade path must be considered before any adoption of
an LTS plan.

The largest change that I'd like to propose for 1.10 is that the minimum
Java language version be bumped to java 8 so that merging code between
versions can use the same language constructs.  As it is now, code written
for 1.9.x cannot use lambda, streams,... all of the "modern" features.
Merging the code forward, one is left with the option of not using those
features, or changing the code which, if not done perfectly, could
introduce a different set of bugs between versions.  Likewise, if someone
wanted to back port a feature from 2.x into the 1.9.x code base, additional
changes, beyond those required because of 2.x restructuring are likely to
be necessary.

The migration from Accumulo 1.9.x to a 2.x is not straight forward and will
require changes to Accumulo clients. However, the largest obstacle to
upgrading to 2.x is with the Hadoop 3 requirement.  This is a major,
non-trival requirement change that is going to take significant effort (and
time) for a large-scale deployments to develop to and then upgrade to
Hadoop 3.  There is going to be significant work required to adequately
test necessary client changes, and then upgrade the deployed systems, first
to Hadoop 3 and then to Accumulo 2.x. And until they can, they are going to
be on a pre-2.x Accumulo version.

With code frozen at 1.9.x, large deployments are going to need to make some
hard decisions - do they continue to use 1.9.x as released, or do they make
some patched Frankenstein version?  If they find that they aggressively
need to patch to get features that improve current operations, how much
additional work is going to be required if / when they are in a position to
upgrade? How much of that work would further delay upgrading to Hadoop 3 /
Accumulo 2.x?

Having features released by the community eases support across the whole
ecosystem. We will all have access to the same code base, the code will be
exercised by the continuous integration tests, and it provides greater
insurance that those features will be available once an upgrade to 2.x is
possible. Otherwise, reasoning about what "version" is actually running and
what that implies when requesting support from the community is just that
much harder for everyone.

My opinion is that if we can accommodate some feature improvements as
groups work to adopting a Hadoop 3 / 2.x deployment, then we can reduce the
work required across the community and the users, work that freezing at
1.9.x for pre-2.x would introduce an additional burdens on the users.

I am in favor of adopting an LTS, but I think we really need to consider
the impact of requiring Hadoop 3 is having on upgrading to Accumulo 2.x in
the LTS plan.

Reply via email to