Hi Andrew,

bq. Source and binary compatibility are not required for 3.0.0. It's a new 
major release, and there are known, documented incompatibilities in this regard.

Technically, it is true. However, in practically, we should retain 
compatibility as much as we can. Otherwise, we could break downstream projects, 
third-party libraries and existing users applications unintentionally. A quick 
example here is a blocker issue I just reported in HADOOP-15059 which break old 
(2.x) MR application with 3.0 deployment - due to token format incompatible 
issue.


bq. To follow up on my earlier email, I don't think there's need for a bridge 
release given that we've successfully tested rolling upgrade from 2.x to 3.0.0.​

Did we find the same issue as HADOOP-15059? If so, just curious on what rolling 
upgrade means here - IMO, upgrade with breaking running applications shouldn't 
be recognized as "rolling". Do I miss anything?



Thanks,


Junping


________________________________
From: Andrew Wang <andrew.w...@cloudera.com>
Sent: Wednesday, November 15, 2017 10:34 AM
To: Junping Du
Cc: Wangda Tan; Steve Loughran; Vinod Kumar Vavilapalli; Kai Zheng; Arun 
Suresh; common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org; Hdfs-dev; 
mapreduce-...@hadoop.apache.org
Subject: Re: [DISCUSS] A final minor release off branch-2?

Hi Junping,

On Wed, Nov 15, 2017 at 1:37 AM, Junping Du 
<j...@hortonworks.com<mailto:j...@hortonworks.com>> wrote:
Thanks Vinod to bring up this discussion, which is just in time.

I agree with most responses that option C is not a good choice as our community 
bandwidth is precious and we should focus on very limited mainstream branches 
to develop, test and deployment. Of course, we should still follow Apache way 
to allow any interested committer for rolling up his/her own release given 
specific requirement over the mainstream releases.

I am not biased on option A or B (I will discuss this later), but I think a 
bridge release for upgrading to and back from 3.x is very necessary.
The reasons are obviously:
1. Given lesson learned from previous experience of migration from 1.x to 2.x, 
no matter how careful we tend to be, there is still chance that some level of 
compatibility (source, binary, configuration, etc.) get broken for the 
migration to new major release. Some of these incompatibilities can only be 
identified in runtime after GA release with widely deployed in production 
cluster - we have tons of downstream projects and numerous configurations and 
we cannot cover them all from in-house deployment and test.

Source and binary compatibility are not required for 3.0.0. It's a new major 
release, and there are known, documented incompatibilities in this regard.

That said, we've done far, far more in this regard compared to previous major 
or minor releases. We've compiled all of CDH against Hadoop 3 and run our suite 
of system tests for the platform. We've been testing in this way since 
3.0.0-alpha1 and found and fixed plenty of source and binary compatibility 
issues during the alpha and beta process. Many of these fixes trickled down 
into 2.8 and 2.9.

2. From recent classpath isolation work, I was surprised to find out that many 
of our downstream projects (HBase, Tez, etc.) are still consuming many 
non-public, server side APIs of Hadoop, not saying the projects/products 
outside of hadoop ecosystem. Our API compatibility test does not (and should 
not) cover these cases and situations. We can claim that new major release 
shouldn't be responsible for these private API changes. But given the 
possibility of breaking existing applications in some way, users could be very 
hesitated to migrate to 3.x release if there is no safe solution to roll back.

This is true for 2.x releases as well. Similar to the previous answer, we've 
compiled all of CDH against Hadoop 3, providing a much higher level of 
assurance even compared to 2.x releases.

3. Beside incompatibilities, there is also possible to have performance 
regressions (lower throughput, higher latency, slower job running, bigger 
memory footprint or even memory leaking, etc.) for new hadoop releases. While 
the performance impact of migration (if any) could be neglectable to some 
users, other users could be very sensitive and wish to roll back if it happens 
on their production cluster.

Yes, bugs exist. I won't claim that 3.0.0 is bug-free. All new releases can 
potentially introduce new bugs.

However, I don't think rollback is the solution. In my experience, users rarely 
rollback since it's so disruptive and causes data loss. It's much more common 
that they patch and upgrade. With that in mind, I'd rather we spend our effort 
on making 3.0.x high-quality vs. making it easier to rollback.

The root of my concern in announcing a "bridge release" is that it discourages 
users from upgrading to 3.0.0 until a bridge release is out. I strongly believe 
the level of quality provided by 3.0.0 is at least equal to new 2.x minor 
releases, given our extended testing and integration process, and we don't have 
bridge releases for 2.x.

This is why I asked for a list of known issues with 2.x -> 3.0 upgrades, that 
would necessitate a bridge release. Arun raised a concern about NM rollback. 
Are there any other *known* issues?

Best,
Andrew

Reply via email to