Hi Junping,

On Wed, Nov 15, 2017 at 1:37 AM, Junping Du <j...@hortonworks.com> wrote:

> Thanks Vinod to bring up this discussion, which is just in time.
>
> I agree with most responses that option C is not a good choice as our
> community bandwidth is precious and we should focus on very limited
> mainstream branches to develop, test and deployment. Of course, we should
> still follow Apache way to allow any interested committer for rolling up
> his/her own release given specific requirement over the mainstream releases.
>
> I am not biased on option A or B (I will discuss this later), but I think
> a bridge release for upgrading to and back from 3.x is very necessary.
> The reasons are obviously:
> 1. Given lesson learned from previous experience of migration from 1.x to
> 2.x, no matter how careful we tend to be, there is still chance that some
> level of compatibility (source, binary, configuration, etc.) get broken for
> the migration to new major release. Some of these incompatibilities can
> only be identified in runtime after GA release with widely deployed in
> production cluster - we have tons of downstream projects and numerous
> configurations and we cannot cover them all from in-house deployment and
> test.
>

Source and binary compatibility are not required for 3.0.0. It's a new
major release, and there are known, documented incompatibilities in this
regard.

That said, we've done far, far more in this regard compared to previous
major or minor releases. We've compiled all of CDH against Hadoop 3 and run
our suite of system tests for the platform. We've been testing in this way
since 3.0.0-alpha1 and found and fixed plenty of source and binary
compatibility issues during the alpha and beta process. Many of these fixes
trickled down into 2.8 and 2.9.

>
> 2. From recent classpath isolation work, I was surprised to find out that
> many of our downstream projects (HBase, Tez, etc.) are still consuming many
> non-public, server side APIs of Hadoop, not saying the projects/products
> outside of hadoop ecosystem. Our API compatibility test does not (and
> should not) cover these cases and situations. We can claim that new major
> release shouldn't be responsible for these private API changes. But given
> the possibility of breaking existing applications in some way, users could
> be very hesitated to migrate to 3.x release if there is no safe solution to
> roll back.
>

This is true for 2.x releases as well. Similar to the previous answer,
we've compiled all of CDH against Hadoop 3, providing a much higher level
of assurance even compared to 2.x releases.

>
> 3. Beside incompatibilities, there is also possible to have performance
> regressions (lower throughput, higher latency, slower job running, bigger
> memory footprint or even memory leaking, etc.) for new hadoop releases.
> While the performance impact of migration (if any) could be neglectable to
> some users, other users could be very sensitive and wish to roll back if it
> happens on their production cluster.
>
> Yes, bugs exist. I won't claim that 3.0.0 is bug-free. All new releases
can potentially introduce new bugs.

However, I don't think rollback is the solution. In my experience, users
rarely rollback since it's so disruptive and causes data loss. It's much
more common that they patch and upgrade. With that in mind, I'd rather we
spend our effort on making 3.0.x high-quality vs. making it easier to
rollback.

The root of my concern in announcing a "bridge release" is that it
discourages users from upgrading to 3.0.0 until a bridge release is out. I
strongly believe the level of quality provided by 3.0.0 is at least equal
to new 2.x minor releases, given our extended testing and integration
process, and we don't have bridge releases for 2.x.

This is why I asked for a list of known issues with 2.x -> 3.0 upgrades,
that would necessitate a bridge release. Arun raised a concern about NM
rollback. Are there any other *known* issues?

Best,
Andrew

Reply via email to