Thanks for your comments, Zheng. Replies inline.

> On the other hand, I've discussed with quite a few 3.0 potential users, it 
> looks like most of them are interested in the erasure coding feature and a 
> major scenario for that is to back up their large volume of data to save 
> storage cost. They might run analytics workload using Hive, Spark, Impala and 
> Kylin on the new cluster based on the version, but it's not a must at the 
> first time. They understand there might be some gaps so they'd migrate their 
> workloads incrementally. For the major analytics workload, we've performed 
> lots of benchmark and integration tests as well as other sides I believe, we 
> did find some issues but they should be fixed in downstream projects. I 
> thought the release of GA will accelerate the progress and expose the issues 
> if any. We couldn't wait for it being matured. There isn't perfectness.


3.0 is a GA release from the Apache Hadoop community. So, we cannot assume that 
all usages in the short term are *only* going to be for storage optimization 
features and only on dedicated clusters. We have to make sure that the 
workloads can be migrated right now and/or that existing clusters can be 
upgraded in-place. If not, we shouldn't be calling it GA.


> This sounds a good consideration. I'm thinking if I'm a Hadoop user, for 
> example, I'm using 2.7.4 or 2.8.2 or whatever 2.x version, would I first 
> upgrade to this bridging release then use the bridge support to upgrade to 
> 3.x version? I'm not sure. On the other hand, I might tend to look for some 
> guides or supports in 3.x docs about how to upgrade from 2.7 to 3.x. 



Arun Suresh also asked this same question earlier. I think this will really 
depend on what we discover as part of the migration and user-acceptance 
testing. If we don't find major issues, you are right, folks can jump directly 
from one of 2.7, 2.8 or 2.9 to 3.0.



> Frankly speaking, working on some bridging release not targeting any feature 
> isn't so attractive to me as a contributor. Overall, the final minor release 
> off branch-2 is good, we should also give 3.x more time to evolve and mature, 
> therefore it looks to me we would have to work on two release lines meanwhile 
> for some time. I'd like option C), and suggest we focus on the recent 
> releases.



Answering this question is also one of the goals of my starting this thread. 
Collectively we need to conclude if we are okay or not okay with no longer 
putting any new feature work in general on the 2.x line after 2.9.0 release and 
move over our focus into 3.0.


Thanks
+Vinod

> -----Original Message-----
> From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] 
> Sent: Tuesday, November 07, 2017 9:43 AM
> To: Andrew Wang <andrew.w...@cloudera.com>
> Cc: Arun Suresh <asur...@apache.org>; common-dev@hadoop.apache.org; 
> yarn-...@hadoop.apache.org; Hdfs-dev <hdfs-...@hadoop.apache.org>; 
> mapreduce-...@hadoop.apache.org
> Subject: Re: [DISCUSS] A final minor release off branch-2?
> 
> The main goal of the bridging release is to ease transition on stuff that is 
> guaranteed to be broken.
> 
> Of the top of my head, one of the biggest areas is application compatibility. 
> When folks move from 2.x to 3.x, are their apps binary compatible? Source 
> compatible? Or need changes?
> 
> In 1.x -> 2.x upgrade, we did a bunch of work to atleast make old apps be 
> source compatible. This means relooking at the API compatibility in 3.x and 
> their impact of migrating applications. We will have to revist and 
> un-deprecate old APIs, un-delete old APIs and write documentation on how apps 
> can be migrated.
> 
> Most of this work will be in 3.x line. The bridging release on the other hand 
> will have deprecation for APIs that cannot be undeleted. This may be already 
> have been done in many places. But we need to make sure and fill gaps if any.
> 
> Other areas that I can recall from the old days
> - Config migration: Many configs are deprecated or deleted. We need 
> documentation to help folks to move. We also need deprecations in the 
> bridging release for configs that cannot be undeleted.
> - You mentioned rolling-upgrades: It will be good to exactly outline the type 
> of testing. For e.g., the rolling-upgrades orchestration order has direct 
> implication on the testing done.
> - Story for downgrades?
> - Copying data between 2.x clusters and 3.x clusters: Does this work already? 
> Is it broken anywhere that we cannot fix? Do we need bridging features for 
> this work?
> 
> +Vinod
> 
>> On Nov 6, 2017, at 12:49 PM, Andrew Wang <andrew.w...@cloudera.com> wrote:
>> 
>> What are the known gaps that need bridging between 2.x and 3.x?
>> 
>> From an HDFS perspective, we've tested wire compat, rolling upgrade, 
>> and rollback.
>> 
>> From a YARN perspective, we've tested wire compat and rolling upgrade. 
>> Arun just mentioned an NM rollback issue that I'm not familiar with.
>> 
>> Anything else? External to this discussion, these should be documented 
>> as known issues for 3.0.
>> 
>> Best.
>> Andrew
>> 
>> On Sun, Nov 5, 2017 at 1:46 PM, Arun Suresh <asur...@apache.org> wrote:
>> 
>>> Thanks for starting this discussion VInod.
>>> 
>>> I agree (C) is a bad idea.
>>> I would prefer (A) given that ATM, branch-2 is still very close to
>>> branch-2.9 - and it is a good time to make a collective decision to 
>>> lock down commits to branch-2.
>>> 
>>> I think we should also clearly define what the 'bridging' release 
>>> should be.
>>> I assume it means the following:
>>> * Any 2.x user wanting to move to 3.x must first upgrade to the 
>>> bridging release first and then upgrade to the 3.x release.
>>> * With regard to state store upgrades (at least NM state stores) the 
>>> bridging state stores should be aware of all new 3.x keys so the 
>>> implicit assumption would be that a user can only rollback from the 
>>> 3.x release to the bridging release and not to the old 2.x release.
>>> * Use the opportunity to clean up deprecated API ?
>>> * Do we even want to consider a separate bridging release for 2.7, 
>>> 2.8 an
>>> 2.9 lines ?
>>> 
>>> Cheers
>>> -Arun
>>> 
>>> On Fri, Nov 3, 2017 at 5:07 PM, Vinod Kumar Vavilapalli < 
>>> vino...@apache.org>
>>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> With 3.0.0 GA around the corner (tx for the push, Andrew!), 2.9.0 RC 
>>>> out (tx Arun / Subru!) and 2.8.2 (tx Junping!), I think it's high 
>>>> time we
>>> have
>>>> a discussion on how we manage our developmental bandwidth between 
>>>> 2.x
>>> line
>>>> and 3.x lines.
>>>> 
>>>> Once 3.0 GA goes out, we will have two parallel and major release lines.
>>>> The last time we were in this situation was back when we did 1.x -> 
>>>> 2.x jump.
>>>> 
>>>> The parallel releases implies overhead of decisions, branch-merges 
>>>> and back-ports. Right now we already do backports for 2.7.5, 2.8.2, 
>>>> 2.9.1,
>>>> 3.0.1 and potentially a 3.1.0 in a few months after 3.0.0 GA. And 
>>>> many of these lines - for e.g 2.8, 2.9 - are going to be used for a 
>>>> while at a bunch of large sites! At the same time, our users won't 
>>>> migrate to 3.0 GA overnight - so we do have to support two parallel lines.
>>>> 
>>>> I propose we start thinking of the fate of branch-2. The idea is to 
>>>> have one final release that helps our users migrate from 2.x to 3.x. 
>>>> This includes any changes on the older line to bridge compatibility 
>>>> issues, upgrade issues, layout changes, tooling etc.
>>>> 
>>>> We have a few options I think
>>>> (A)
>>>>   -- Make 2.9.x the last minor release off branch-2
>>>>   -- Have a maintenance release that bridges 2.9 to 3.x
>>>>   -- Continue to make more maintenance releases on 2.8 and 2.9 as 
>>>> necessary
>>>>   -- All new features obviously only go into the 3.x line as no
>>> features
>>>> can go into the maint line.
>>>> 
>>>> (B)
>>>>   -- Create a new 2.10 release which doesn't have any new features, 
>>>> but as a bridging release
>>>>   -- Continue to make more maintenance releases on 2.8, 2.9 and 
>>>> 2.10 as necessary
>>>>   -- All new features, other than the bridging changes, go into the 
>>>> 3.x line
>>>> 
>>>> (C)
>>>>   -- Continue making branch-2 releases and postpone this discussion 
>>>> for later
>>>> 
>>>> I'm leaning towards (A) or to a lesser extent (B). Willing to hear 
>>>> otherwise.
>>>> 
>>>> Now, this obviously doesn't mean blocking of any more minor releases 
>>>> on branch-2. Obviously, any interested committer / PMC can roll up 
>>>> his/her sleeves, create a release plan and release, but we all need 
>>>> to
>>> acknowledge
>>>> that versions are not cheap and figure out how the community 
>>>> bandwidth is split overall.
>>>> 
>>>> Thanks
>>>> +Vinod
>>>> PS: The proposal is obviously not to force everyone to go in one
>>> direction
>>>> but more of a nudging the community to figure out if we can focus a 
>>>> major part of of our bandwidth on one line. I had a similar concern 
>>>> when we
>>> were
>>>> doing 2.8 and 3.0 in parallel, but the impending possibility of 
>>>> spreading too thin is much worse IMO.
>>>> PPS: (C) is a bad choice. With 2.8 and 2.9 we are already seeing 
>>>> user adoption splintering between two lines. With 2.10, 2.11 etc 
>>>> coexisting
>>> with
>>>> 3.0, 3.1 etc, we will revisit the mad phase years ago when we had 
>>>> 0.20.x, 0.20-security coexisting with 0.21, 0.22 etc.
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to