Thanks Vinod to bring up this discussion, which is just in time.

I agree with most responses that option C is not a good choice as our community 
bandwidth is precious and we should focus on very limited mainstream branches 
to develop, test and deployment. Of course, we should still follow Apache way 
to allow any interested committer for rolling up his/her own release given 
specific requirement over the mainstream releases.

I am not biased on option A or B (I will discuss this later), but I think a 
bridge release for upgrading to and back from 3.x is very necessary. 
The reasons are obviously:
1. Given lesson learned from previous experience of migration from 1.x to 2.x, 
no matter how careful we tend to be, there is still chance that some level of 
compatibility (source, binary, configuration, etc.) get broken for the 
migration to new major release. Some of these incompatibilities can only be 
identified in runtime after GA release with widely deployed in production 
cluster - we have tons of downstream projects and numerous configurations and 
we cannot cover them all from in-house deployment and test.

2. From recent classpath isolation work, I was surprised to find out that many 
of our downstream projects (HBase, Tez, etc.) are still consuming many 
non-public, server side APIs of Hadoop, not saying the projects/products 
outside of hadoop ecosystem. Our API compatibility test does not (and should 
not) cover these cases and situations. We can claim that new major release 
shouldn't be responsible for these private API changes. But given the 
possibility of breaking existing applications in some way, users could be very 
hesitated to migrate to 3.x release if there is no safe solution to roll back. 

3. Beside incompatibilities, there is also possible to have performance 
regressions (lower throughput, higher latency, slower job running, bigger 
memory footprint or even memory leaking, etc.) for new hadoop releases. While 
the performance impact of migration (if any) could be neglectable to some 
users, other users could be very sensitive and wish to roll back if it happens 
on their production cluster.

As Andrew mentioned in early email threads, some work has been done for 
verifying rolling upgrade from 2.x to 3.0 (just curious that which 2.x release 
is tested to upgrade from? 2.8.2 or 2.9.0 which is still in releasing?). But I 
am not aware any work we are doing now to test downgrade from 3.0 to 2.x 
(correct me if I miss any work). If users hit any of three situations I 
mentioned above then we should give them the chance to roll back if they are 
really conservative to these unexpected side-effect of upgrading. Given this, 
we should have this bridge release to cover the case for 3.0 safely roll back 
(no matter rolling or not). I am not sure it should be 2.9.x or 2.10.x for now 
(we can just call it 2.BR release) because we are not sure what exactly changes 
we should include for supporting roll back from 3.0 at this moment. We can 
defer this decision to discuss later when we have better ideas.

Summary for my two cents:
- No more feature release should happen on branch-2. 2.9 or 2.10 should be the 
last minor release (mainstream of community) on branch-2

- A bridge release is necessary for safely upgrade/downgrade to 3.x

- We can decide later to see if 2.10 is necessary when scope of the bridge 
release is more clear.


Thanks,

Junping

________________________________________
From: Andrew Wang <andrew.w...@cloudera.com>
Sent: Tuesday, November 14, 2017 2:25 PM
To: Wangda Tan
Cc: Steve Loughran; Vinod Kumar Vavilapalli; Kai Zheng; Arun Suresh; 
common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org; Hdfs-dev; 
mapreduce-...@hadoop.apache.org
Subject: Re: [DISCUSS] A final minor release off branch-2?

To follow up on my earlier email, I don't think there's need for a bridge
release given that we've successfully tested rolling upgrade from 2.x to
3.0.0. I expect we'll keep making improvements to smooth over any
additional incompatibilities found, but there isn't a requirement that a
user upgrade to a bridge release before upgrading to 3.0.

Otherwise, I don't have a strong opinion about when to discontinue branch-2
releases. Historically, a release line is maintained until interest in it
wanes. If the maintainers are taking care of the backports, it's not much
work for the rest of us to vote on the RCs.

Best,
Andrew

On Mon, Nov 13, 2017 at 4:19 PM, Wangda Tan <wheele...@gmail.com> wrote:

> Thanks Vinod for staring this,
>
> I'm also leaning towards the plan (A):
>
>
>
>
> * (A)    -- Make 2.9.x the last minor release off branch-2    -- Have a
> maintenance release that bridges 2.9 to 3.x    -- Continue to make more
> maintenance releases on 2.8 and 2.9 as necessary*
>
> The only part I'm not sure is having a separate bridge release other than
> 3.x.
>
> For the bridge release, Steve's suggestion sounds more doable:
>
> ** 3.1+ for new features*
> ** fixes to 3.0.x &, where appropriate, 2.9, esp feature stabilisation*
> ** whoever puts their hand up to do 2.x releases deserves support in
> testing &c*
> ** If someone makes a really strong case to backport a feature from 3.x to
> branch-2 and its backwards compatible, I'm not going to stop them. It's
> just once 3.0 is out and a 3.1 on the way, it's less compelling*
>
> This makes community can focus on 3.x releases and fill whatever gaps of
> migrating from 2.x to 3.x.
>
> Best,
> Wangda
>
>
> On Wed, Nov 8, 2017 at 3:57 AM, Steve Loughran <ste...@hortonworks.com>
> wrote:
>
>>
>> > On 7 Nov 2017, at 19:08, Vinod Kumar Vavilapalli <vino...@apache.org>
>> wrote:
>> >
>> >
>> >
>> >
>> >> Frankly speaking, working on some bridging release not targeting any
>> feature isn't so attractive to me as a contributor. Overall, the final
>> minor release off branch-2 is good, we should also give 3.x more time to
>> evolve and mature, therefore it looks to me we would have to work on two
>> release lines meanwhile for some time. I'd like option C), and suggest we
>> focus on the recent releases.
>> >
>> >
>> >
>> > Answering this question is also one of the goals of my starting this
>> thread. Collectively we need to conclude if we are okay or not okay with no
>> longer putting any new feature work in general on the 2.x line after 2.9.0
>> release and move over our focus into 3.0.
>> >
>> >
>> > Thanks
>> > +Vinod
>> >
>>
>>
>> As a developer of new features (e.g the Hadoop S3A committers), I'm
>> mostly already committed to targeting 3.1; the code in there to deal with
>> failures and retries has unashamedly embraced java 8 lambda-expressions in
>> production code: backporting that is going to be traumatic in terms of
>> IDE-assisted code changes and the resultant diff in source between branch-2
>> & trunk. What's worse, its going to be traumatic to test as all my JVMs
>> start with an 8 at the moment, and I'm starting to worry about whether I
>> should bump a windows VM up to Java 9 to keep an eye on Akira's work there.
>> Currently the only testing I'm really doing on java 7 is yetus branch-2 &
>> internal test runs.
>>
>>
>> 3.0 will be out the door, and we can assume that CDH will ship with it
>> soon (*)  which will allow for a rapid round trip time on inevitable bugs:
>> 3.1 can be the release with compatibility tuned, those reported issues
>> addressed. It's certainly where I'd like to focus.
>>
>>
>> At the same time: 2.7.2-2.8.x are the broadly used versions, we can't
>> just say "move to 3.0" & expect everyone to do it, not given we have
>> explicitly got backwards-incompatible changes in. I don't seen people
>> rushing to do it until the layers above are all qualified (HBase, Hive,
>> Spark, ...). Which means big users of 2.7/2,8 won't be in a rush to move
>> and we are going to have to maintain 2.x for a while, including security
>> patches for old versions. One issue there: what if a patch (such as bumping
>> up a JAR version) is incompatible?
>>
>> For me then
>>
>> * 3.1+ for new features
>> * fixes to 3.0.x &, where appropriate, 2.9, esp feature stabilisation
>> * whoever puts their hand up to do 2.x releases deserves support in
>> testing &c
>> * If someone makes a really strong case to backport a feature from 3.x to
>> branch-2 and its backwards compatible, I'm not going to stop them. It's
>> just once 3.0 is out and a 3.1 on the way, it's less compelling
>>
>> -Steve
>>
>> Note: I'm implicitly assuming a timely 3.1 out the door with my work
>> included, all all issues arriving from 3,0 fixed. We can worry when 3.1
>> ships whether there's any benefit in maintaining a 3.0.x, or whether it's
>> best to say "move to 3.1"
>>
>>
>>
>> (*) just a guess based the effort & test reports of Andrew & others
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to