Hi Andrew, bq. Source and binary compatibility are not required for 3.0.0. It's a new major release, and there are known, documented incompatibilities in this regard.
Technically, it is true. However, in practically, we should retain compatibility as much as we can. Otherwise, we could break downstream projects, third-party libraries and existing users applications unintentionally. A quick example here is a blocker issue I just reported in HADOOP-15059 which break old (2.x) MR application with 3.0 deployment - due to token format incompatible issue. bq. To follow up on my earlier email, I don't think there's need for a bridge release given that we've successfully tested rolling upgrade from 2.x to 3.0.0. Did we find the same issue as HADOOP-15059? If so, just curious on what rolling upgrade means here - IMO, upgrade with breaking running applications shouldn't be recognized as "rolling". Do I miss anything? Thanks, Junping ________________________________ From: Andrew Wang <andrew.w...@cloudera.com> Sent: Wednesday, November 15, 2017 10:34 AM To: Junping Du Cc: Wangda Tan; Steve Loughran; Vinod Kumar Vavilapalli; Kai Zheng; Arun Suresh; common-...@hadoop.apache.org; yarn-...@hadoop.apache.org; Hdfs-dev; mapreduce-dev@hadoop.apache.org Subject: Re: [DISCUSS] A final minor release off branch-2? Hi Junping, On Wed, Nov 15, 2017 at 1:37 AM, Junping Du <j...@hortonworks.com<mailto:j...@hortonworks.com>> wrote: Thanks Vinod to bring up this discussion, which is just in time. I agree with most responses that option C is not a good choice as our community bandwidth is precious and we should focus on very limited mainstream branches to develop, test and deployment. Of course, we should still follow Apache way to allow any interested committer for rolling up his/her own release given specific requirement over the mainstream releases. I am not biased on option A or B (I will discuss this later), but I think a bridge release for upgrading to and back from 3.x is very necessary. The reasons are obviously: 1. Given lesson learned from previous experience of migration from 1.x to 2.x, no matter how careful we tend to be, there is still chance that some level of compatibility (source, binary, configuration, etc.) get broken for the migration to new major release. Some of these incompatibilities can only be identified in runtime after GA release with widely deployed in production cluster - we have tons of downstream projects and numerous configurations and we cannot cover them all from in-house deployment and test. Source and binary compatibility are not required for 3.0.0. It's a new major release, and there are known, documented incompatibilities in this regard. That said, we've done far, far more in this regard compared to previous major or minor releases. We've compiled all of CDH against Hadoop 3 and run our suite of system tests for the platform. We've been testing in this way since 3.0.0-alpha1 and found and fixed plenty of source and binary compatibility issues during the alpha and beta process. Many of these fixes trickled down into 2.8 and 2.9. 2. From recent classpath isolation work, I was surprised to find out that many of our downstream projects (HBase, Tez, etc.) are still consuming many non-public, server side APIs of Hadoop, not saying the projects/products outside of hadoop ecosystem. Our API compatibility test does not (and should not) cover these cases and situations. We can claim that new major release shouldn't be responsible for these private API changes. But given the possibility of breaking existing applications in some way, users could be very hesitated to migrate to 3.x release if there is no safe solution to roll back. This is true for 2.x releases as well. Similar to the previous answer, we've compiled all of CDH against Hadoop 3, providing a much higher level of assurance even compared to 2.x releases. 3. Beside incompatibilities, there is also possible to have performance regressions (lower throughput, higher latency, slower job running, bigger memory footprint or even memory leaking, etc.) for new hadoop releases. While the performance impact of migration (if any) could be neglectable to some users, other users could be very sensitive and wish to roll back if it happens on their production cluster. Yes, bugs exist. I won't claim that 3.0.0 is bug-free. All new releases can potentially introduce new bugs. However, I don't think rollback is the solution. In my experience, users rarely rollback since it's so disruptive and causes data loss. It's much more common that they patch and upgrade. With that in mind, I'd rather we spend our effort on making 3.0.x high-quality vs. making it easier to rollback. The root of my concern in announcing a "bridge release" is that it discourages users from upgrading to 3.0.0 until a bridge release is out. I strongly believe the level of quality provided by 3.0.0 is at least equal to new 2.x minor releases, given our extended testing and integration process, and we don't have bridge releases for 2.x. This is why I asked for a list of known issues with 2.x -> 3.0 upgrades, that would necessitate a bridge release. Arun raised a concern about NM rollback. Are there any other *known* issues? Best, Andrew