Re: [DISCUSS] Apache Hadoop 1.0?

Steve Loughran Thu, 17 Nov 2011 02:47:02 -0800

On 17/11/11 02:06, Scott Carey wrote:



On 11/16/11 3:51 PM, "Nathan Roberts"<[email protected]>  wrote:

On 11/16/11 4:43 PM, "Arun C Murthy"<[email protected]>  wrote:

I propose we adopt the convention that a new major version should be a
superset of the previous major version, features-wise.

Just so I'm clear. This is only guaranteed at the time the new major
version is started. A day later a previous major line may merge a feature
from trunk and then it's no longer the case that 2.x.y is a superset. If
that's the case I'm not sure of the value of the convention. We could say
that new major versions always start from trunk, but that doesn't have
meaning outside of the developer community.


I don't think in general one can say that major versions are a superset of
previous major versions.  Then you would need to have a SuperMajor version
number for the (rare) times that this was broken.
In other words, the major version number really can't have any
restrictions.
Perhaps however, one can say that minor versions are supersets of prior
minor version if one were to define 'superset'.

Its going to be hard to claim that the 0.23 branch is a superset of 0.22
-- After all, there is no JobTracker and all sorts of stuff has been
removed or replaced with something else.  Whether that defines a superset
or not gets into a lot of semantics of what we mean by 'superset'.

Perhaps like 'feature' or 'bug fix', it is best not to get into the
semantics of defining what we mean by 'superset' and rather define version
number meaning only in terms of compatibility classifications.  Especially
since the compatibility classification has implications for all of these
other things  -- and IMO more clearly useful ones.  For example, consider
that a "bug fix" may break wire compatibility, that a tiny harmless change
can be considered a "new feature", or that replacing a single link in a UI
could be considered breaking a "superset" rule.

I think it would be good to distinguish user-API supersets/subsets withinternal superset/subsets

-0.23 is a superset of the MR and HDFS APIs compatible with previousversions (I don't know or care whether or not it is a proper superset ornot). The goal here is that end user apps and higher levels in the stack(in-ASF and out-ASF) should work, though testing is required to verifythis.

A failure of the layers above to work with 0.23+ is something thatshould be considered a regression, looked at and then either dismissedas "you weren't meant to do that" or triggers a fix.

-0.23 has changed the back end means by which jobs are scheduled; themonitoring APIs have changed, etc, etc. Where people will see a visibledifference is in the JT Web UI. That's not an API-level change

A failure of any code that goes into this bit of the system to compileor run against 0.23 is something people can feel slightly sorry about,but not enough to trigger reversions.

What I will miss in 0.23 is the MiniMRCluster, which I consider to bepart of the API. Certainly its why I pull inhadoop-common-test-0.20.20x.jar into downstream builds, because it isthe simplest way to do basic tests in junit of MR operations. It's alsothe most lightweight way to do single-machine Hadoop runs over smalldatasets.

Re: [DISCUSS] Apache Hadoop 1.0?

Reply via email to