Hi All, I believe we're all on the same page on removing Kite, so I've opened SQOOP-3313 to track that. @Attila I'm glad to see you're interest in the ORC part. It would be highly appreciated if you could take a look at this review request.
I'm not that familiar with Flume, but it seems they've added NG after architectural changes and released FlumeNG 1.0 after Flume 0.9.4 . Even if we go with NG, I'd suggest calling it 3.0, to avoid confusion with earlier releases. I think the biggest part of keeping Hadoop 2 (and previous versions of downstream projects like Hive) supported would be testing against those. It would also require at least another build profile to build against them, and probably another layer of abstraction in the code (like Hadoop shims in Hive). Not sure about vendors, but I think they're usually not adding new features to older release lines. In my opinion we should branch off from current trunk to track the 1.x release line (where we keep supporting Hadoop 2) and keep adding bugfixes there, but add new features to trunk only and don't worry about Hadoop 2 there. I agree with Attila on the dependencies. We shouldn't release based on non-final releases. We might bump the dependencies to some alpha/beta during development, but don't forget to move to the final version in the end. +1 for Bogi as release manager. Regards, Daniel  https://reviews.apache.org/r/66548/  https://blogs.apache.org/flume/entry/flume_ng_architecture On Fri, Apr 13, 2018 at 5:24 PM Szabó Attila <mau...@inf.elte.hu> wrote: > > > Hello everyone, > > > I'd like to also attach my thoughts: > > > New Sqoop version: Last time when I'd the chance to talk about this with > some of the PMC members (e.g. Jarcec, Kate ) we've been on the front to > create Sqoop-NG (NG == Next Generation), quite the same what the Flume > community did (and AFAIK from Mike Percy it's been a quite successful act > from their POV). Don't get me wrong, I'm totall NOT against 3.0, though > IMHO Sqoop-NG 1.0 would be a better choice. > > > Kite: I would totally split this effort into two subtasks. First I would > get in contact with the Parquet team, and would create a KITE independent > execution path in Sqoop for the Parquet backed tables (Hive/Impala/etc.). > As a part of this effort I would also add direct support for ORC format (in > the past few years I've found it very useful in several different > situation, and usually it's quite inconvenient that Sqoop does not support > it "out of the box"). > > As the second substask I would start to remove every KITE based dependency > (but according to my gut feeling it could break the codebase on too many > places, and might not be that EZ to succeed on that front). > > > Hadoop 2: > > Could anyone please highlight me what would be the pros/cons on this > front? AFAIK several vendors (including Cloudera, Hortonworks, MapR, EMR, > etc.) are still supporting Hadoop 2, and according to my best knowledge > most of the userbase are connected to their releases, so I'd like to > provide the chance for those users to use the newest features of Sqoop, > thus I would vote for the compatibility for a bit more time/versions. > > > Dependencies: > > I'd like to cast my very direct and LOUD vote against any alpha > dependencies (including HBase or anything else!). IMHO Sqoop is already a > stable component of the Apache Foundation, and the users can depend on it, > thus I'd like to avoid any kind of "immature" dependency related issues. Of > course this is also just my solo opinion, but as a community I think we > must not undermine our stability. > > On the other fronts I totally agree and +1 with the planned efforts, > > Best regards, > Attila > > ________________________________ > From: Szabolcs Vasas <va...@apache.org> > Sent: Friday, April 13, 2018 3:43 PM > To: firstname.lastname@example.org > Subject: Re: Release to support Hadoop 3 > > Hi all, > > I also think that completely eliminating the Kite dependency from Sqoop > would be the easiest way of going forward, I will try to analyze this topic > a bit more next week and come up with subtasks so we could work on it in > parallel potentially. > > I am happy with the Sqoop 3.0 scope proposal too and Bogi being the release > manager of it. > > Szabolcs > > > On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <b...@apache.org> wrote: > > > Hi Daniel et al, > > > > Thanks for bringing up this topic and the detailed status update. > > > > I am sharing my thoughts point by point, please find them below. > > > > 1) How to get a new Kite release? Maybe we should remove the Kite > > > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)? > > > > > > I think making a new Kite release would be a huge effort as it would > > require upgrading the versions, making the necessary code modifications, > > testing it thoroughly, etc. then making the release itself meanwhile Kite > > is a very passively handled tool having minimal activity on it thus it > > would definitely mean a lot of effort to get it done. It would have a > > dependency on Solr community too as the Morphlines module of Kite is > > heavily used and somewhat actively developed by them. Also indeed there > is > > a shorter/longer term goal to get rid of Kite dependency in Sqoop > entirely, > > i.e. all release efforts would become throw-away very soon. > > > > Focusing on the Kite removal seems to be more reasonable to me. However > it > > would be great to see an estimation regarding this effort, @Szabolcs > could > > you maybe share your thoughts on this? > > > > 2) Should we drop support for Hadoop 2? > > > > > > > I think we can drop support for Hadoop 2 especially if we use > > straightforward versioning with the new release. > > > > > > > 3) What version number should we use? To avoid confusion with Sqoop2 > I'd > > go > > > with 3.0. > > > > > > > I like this idea, +1 for making a 3.0 release containing these changes. > > > > > > > 4) Does (should?) this affect the 1.5 release? > > > > > > I think the answer is yes. Currently the following breaking changes are > on > > the horizon which could be part of a next Sqoop release: > > * com.cloudera package removal (done) > > * Gradle introduction (in progress) > > * Hadoop/Hive/HBase version upgrade (in progress) > > * Kite deprecation/removal (planned) > > * Bump Java version to 8 (planned ) > > > > Looking at this list I would say that making a Sqoop 1.5 release > containing > > only the com.cloudera package removal, the Gradle introduction and the > Java > > version bump would mean a somewhat small and irrelevant scope from a user > > perspective so maybe having two releases (1.5 and 3.0) would be a little > > bit overkill. I would instead suggest to go with a Sqoop 3.0 release > > containing all the changes listed above. What do you think? > > > > Summarizing it up I see the following dependencies for a next Sqoop > release > > currently: > > * Finishing up the Gradle patch > > * Hive 3 release > > * Kite removal - this could be the next common effort in the community > > > > Anyhow I would be happy to take the Release Manager role for the next > > release, please let me know if everyone would be OK with that. > > > > I am looking forward to see others thoughts on this too. > > > > Many thanks, > > Bogi > > > > On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös <daniel.vo...@gmail.com> > > wrote: > > > > > Dear All, > > > > > > After some development towards supporting Hadoop 3 (and latest version > of > > > downstream components) I'd like to summarize the current state of the > > > upgrade and start the conversation about releasing a new version of > Sqoop > > > with Hadoop 3 support. > > > > > > Here's what happened so far: > > > - Upgraded Hadoop dependency to 3.0.0 > > > - Hive had to be upgraded, since old Hive didn't work with Hadoop 3. > > > - HBase had to be upgraded since Hive 3 depends on HBase 2(alpha) > > > - Dealt with a bunch of minor issues like changed Hadoop configuration > > > names and different packaging of Maven artifacts. > > > > > > For details please refer to this ticket and the attached review > request: > > > https://issues.apache.org/jira/browse/SQOOP-3305 > > > > > > Remaining work: > > > - Parquet importing doesn't work. It was broken by a > > standalone-metastore > > > change in Hive and fixing would require a new Kite version to be built > > > against Hive 3. > > > - Hive 3 is going to enable ACID tables by default. We should support > > > importing into these. Details: > > > https://issues.apache.org/jira/browse/SQOOP-3311 > > > > > > Other blocking issues: > > > - There's no Hive 3 release (no alpha/beta) yet. > > > > > > I'd like to kindly ask you all to share any other tasks/issues you know > > of > > > that we should address to support the latest versions. Also, there are > a > > > couple open questions: > > > 1) How to get a new Kite release? Maybe we should remove the Kite > > > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)? > > > 2) Should we drop support for Hadoop 2? > > > 3) What version number should we use? To avoid confusion with Sqoop2 > I'd > > > go with 3.0. > > > 4) Does (should?) this affect the 1.5 release? > > > > > > Regards, > > > Daniel > > > > > >