Hi Quanlong,

Thanks for the suggestion. I wonder if there is a third strategy:

c) Isolate the Hadoop 2.x/3.x differences into clearly-defined driver layer so 
that basically all of 3.x can be applied to the 2.x branch. Said another way, a 
single source base can work against either Hadoop 2.x or 3.x, with the build 
(C++) or runtime (Java) choosing the proper “driver” classes.

This is the method used by Oracle, Informix and others back in the days when 
dozens of companies had their own “Unix standard.”

Anyone know the dependencies that differ between 2.x and 3.x? I’d guess they 
are large: HDFS, HMS, HBase, Hive and more… I wonder how hard it would be to 
factor those out of the code into a driver layer. What would be the cost of 
doing that vs. the cost of maintaining two divergent branches?

I’d be concerned that so many changes have gone into the 3.x branch that 
cherry-picking will get progressively more difficult, especially if commits are 
skipped. I saw this recently when we tried to back-port a recent patch on the 
3.x branch to the 2.x branch.

Thanks,

- Paul

> On Jan 27, 2019, at 7:09 PM, Quanlong Huang <huangquanl...@gmail.com> wrote:
> 
> Hi friends,
> 
> It's time to move forward the branch-2.x. Though we've made great
> features/improvements in Impala-3.x, people’s impression of Impala is still
> in the 2.x era. Most of them still using Hadoop2 in production and have no
> choices to try Impala-3.x. I believe Hadoop2 will still be used for some
> years. It's a pity if we lose those users.
> 
> I'd like to have a try to move forward branch-2.x. Hopes you can give some
> suggestions! There're two proposals I can come up with:
> (a) Cherry-pick mature improvements/features into branch-2.x feature by
> feature.
> (b) Cherry-pick commits in branch-3.x one by one (skip those just for 3.x)
> 
> I summarize a "commits diff" between branch-3.x, branch-2.x and
> cloudera/cdh-5.16.1-release:
> https://docs.google.com/spreadsheets/d/12h1rTAPS1gm0vhlDGxeOXjnRD7rrOcoqzX4rjRRCyBg
> 
> It shows up that Cloudera release is doing in (a) and pick up few commits.
> However, It does pick up some commits in batch from branch-3.x (e.g.
> commits of LocalCatalog). I think it's a good example for (a).
> 
> However, (a) needs more efforts than (b). If we doing in way (b), we just
> need to fix cherry-pick conflicts, run GVO and then merge the commit if the
> tests are passed.
> 
> What do you think? Could anyone share some experience about how other
> projects (e.g. Hadoop, Hive, HBase) manage several branches together?
> 
> Thanks,
> Quanlong Huang

Reply via email to