For #2, from what I've read, we should definitely bump up the dependency
on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to
2.2.0-beta for that hadoop-2 profile.
I probably stated this before, but I'd much rather see more effort in
testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon)
against hadoop-2 (like Mike's point about HA). I'm not sure if anyone
ever did testing of Accumulo with the hadoop-2 features -- I seem to
recall that it was more testing does Accumulo run on both hadoop 1 and 2.
If we can maintain a single artifact, that would definitely be easiest
for users, but falling back to user-built artifacts or convenience
releases isn't the end of the world.
As far as commits, I'd like to see as much separation as possible, but
it's understandable if the changes overlap and don't make sense to split
out.
On 10/14/13 12:55 PM, Sean Busbey wrote:
Hey All,
I'd like to restart the conversation from end July / start August about
Hadoop 2 support on the 1.4 branch.
Specifically, I'd like to get some requirements ironed out so I can file
one or more jiras. I'd also like to get a plan for application.
=requirements
Here's the requirements I have from the last thread:
1) Maintain existing 1.4 compatibility
The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4
tag)[1]
I don't see anything in the README[2] nor the user manual[3] on other
versions being supported.
2) Gain Hadoop 2 support
At the moment, I'm presuming this means Apache release 2.0.4-alpha since
that's what 1.5.0 builds against for Hadoop 2.
3) Test for correctness on given versions, with >= 5 node cluster
* Unit Tests
* Functional Tests
* 24hr continuous + verification
* 24hr continuous + verification + agitation
* 24hr random walk
* 24hr random walk + agitation
Keith mentioned running these against a CDH4 cluster, but I presume that
since Apache Releases are our stated compatibilities it would actually be
against whatever versions we list. Based on #1 and #2 above, I would expect
that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.
4) Binary packaging
4a) Either source produces a single binary for all accepted versions
or
4b) Instructions for building from source for each versions and somehow
flag what (if any) convenience binaries are made for the release.
=application
There will be many back-ported patches. Not much active development happens
on 1.4.x now, but I presume this should still all go onto a feature branch?
Is the community preference that eventually all the changes become a single
commit (or one-per-subtask if there are multiple jiras) on the active 1.4
development branch, or that the original patches remain broken out?
For what it's worth, I'd recommend keeping them broken out. (And that's how
the initial development against CDH4 has been done.)
[1] http://bit.ly/1fxucMe
[2] http://bit.ly/192zUAJ
[3]
http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies