On 17/11/11 02:06, Scott Carey wrote:
On 11/16/11 3:51 PM, "Nathan Roberts"<[email protected]> wrote:
On 11/16/11 4:43 PM, "Arun C Murthy"<[email protected]> wrote:
I propose we adopt the convention that a new major version should be a
superset of the previous major version, features-wise.
Just so I'm clear. This is only guaranteed at the time the new major
version is started. A day later a previous major line may merge a feature
from trunk and then it's no longer the case that 2.x.y is a superset. If
that's the case I'm not sure of the value of the convention. We could say
that new major versions always start from trunk, but that doesn't have
meaning outside of the developer community.
I don't think in general one can say that major versions are a superset of
previous major versions. Then you would need to have a SuperMajor version
number for the (rare) times that this was broken.
In other words, the major version number really can't have any
restrictions.
Perhaps however, one can say that minor versions are supersets of prior
minor version if one were to define 'superset'.
Its going to be hard to claim that the 0.23 branch is a superset of 0.22
-- After all, there is no JobTracker and all sorts of stuff has been
removed or replaced with something else. Whether that defines a superset
or not gets into a lot of semantics of what we mean by 'superset'.
Perhaps like 'feature' or 'bug fix', it is best not to get into the
semantics of defining what we mean by 'superset' and rather define version
number meaning only in terms of compatibility classifications. Especially
since the compatibility classification has implications for all of these
other things -- and IMO more clearly useful ones. For example, consider
that a "bug fix" may break wire compatibility, that a tiny harmless change
can be considered a "new feature", or that replacing a single link in a UI
could be considered breaking a "superset" rule.
I think it would be good to distinguish user-API supersets/subsets with
internal superset/subsets
-0.23 is a superset of the MR and HDFS APIs compatible with previous
versions (I don't know or care whether or not it is a proper superset or
not). The goal here is that end user apps and higher levels in the stack
(in-ASF and out-ASF) should work, though testing is required to verify
this.
A failure of the layers above to work with 0.23+ is something that
should be considered a regression, looked at and then either dismissed
as "you weren't meant to do that" or triggers a fix.
-0.23 has changed the back end means by which jobs are scheduled; the
monitoring APIs have changed, etc, etc. Where people will see a visible
difference is in the JT Web UI. That's not an API-level change
A failure of any code that goes into this bit of the system to compile
or run against 0.23 is something people can feel slightly sorry about,
but not enough to trigger reversions.
What I will miss in 0.23 is the MiniMRCluster, which I consider to be
part of the API. Certainly its why I pull in
hadoop-common-test-0.20.20x.jar into downstream builds, because it is
the simplest way to do basic tests in junit of MR operations. It's also
the most lightweight way to do single-machine Hadoop runs over small
datasets.