[ 
https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685038#comment-15685038
 ] 

Andrew Wang commented on HDFS-11096:
------------------------------------

I talked with [~kasha] about compatibility generally offline, which was very 
helpful. Some notes:

h2. Source and binary compatibility

>From the API guidelines:

{quote}
Public-Stable APIs must be deprecated for at least one major release prior to 
their removal in a major release.
{quote}

>From the ABI guidelines:

{quote}
In particular for MapReduce applications, the developer community will try our 
best to support provide binary compatibility across major releases e.g. 
applications using org.apache.hadoop.mapred.
...
APIs are supported compatibly across hadoop-1.x and hadoop-2.x. See 
Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x for 
more details.
{quote}

The intention encoded in these guidelines is that we should strive to not break 
API or ABI compatibility in a major release. Regarding Public/Stable APIs, I 
think this means we can't remove in 3.0 unless it was deprecated in 2.2.

There are also other ways of breaking ABI compatibility (e.g. adding a new 
abstract method to an interface), which I think should be included under this 
guideline.

Since users can bundle MR jars with their application, MR compat is somewhat 
less important than HDFS/YARN compatibility.

h2. Wire compatibility

Client/server wire compatibility is important since clients might want to 
read/write data or submit jobs across versions.

Server/server compatibility is important for rolling upgrade.

>From the compat guide:

{quote}
Compatibility can be broken only at a major release, though breaking 
compatibility even at major releases has grave consequences and should be 
discussed in the Hadoop community.
{quote}

If we had to prioritize, I think client/server compatibility is the more 
important of the two, though based on my audit of the HDFS PBs for alpha1, 
server/server also seemed okay.

h2. Discussion

The biggest need here is for testing.

Source compatibility testing is the easiest, and relatively well covered. 
Downstream projects have been picking up 3.0.0-alpha1, and here at Cloudera, 
we've got all of the CDH projects compiling against alpha1 with posted fixes.

Binary compatibility is more difficult, and not covered by Cloudera's internal 
testing since we compile all of CDH as a monolith. JACC though covers this, and 
I set up [nightly 
runs|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-trunk-JACC/] for 
trunk on Jenkins.

Wire compatibility is the most difficult. There's no automated check for PB or 
REST compatibility, and setting up cross-version clusters is essentially 
impossible in a unit test. This has been a problem even within just the 2.x 
line, so there's a real need for better cross-version integration testing.

If you're interested in compatibility, additional input on prioritization and 
test strategy would be appreciated.

> Support rolling upgrade between 2.x and 3.x
> -------------------------------------------
>
>                 Key: HDFS-11096
>                 URL: https://issues.apache.org/jira/browse/HDFS-11096
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: rolling upgrades
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Priority: Blocker
>
> trunk has a minimum software version of 3.0.0-alpha1. This means we can't 
> rolling upgrade between branch-2 and trunk.
> This is a showstopper for large deployments. Unless there are very compelling 
> reasons to break compatibility, let's restore the ability to rolling upgrade 
> to 3.x releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to