After some development towards supporting Hadoop 3 (and latest version of
downstream components) I'd like to summarize the current state of the
upgrade and start the conversation about releasing a new version of Sqoop
with Hadoop 3 support.
Here's what happened so far:
- Upgraded Hadoop dependency to 3.0.0
- Hive had to be upgraded, since old Hive didn't work with Hadoop 3.
- HBase had to be upgraded since Hive 3 depends on HBase 2(alpha)
- Dealt with a bunch of minor issues like changed Hadoop configuration
names and different packaging of Maven artifacts.
For details please refer to this ticket and the attached review request:
- Parquet importing doesn't work. It was broken by a standalone-metastore
change in Hive and fixing would require a new Kite version to be built
against Hive 3.
- Hive 3 is going to enable ACID tables by default. We should support
importing into these. Details:
Other blocking issues:
- There's no Hive 3 release (no alpha/beta) yet.
I'd like to kindly ask you all to share any other tasks/issues you know of
that we should address to support the latest versions. Also, there are a
couple open questions:
1) How to get a new Kite release? Maybe we should remove the Kite
dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
2) Should we drop support for Hadoop 2?
3) What version number should we use? To avoid confusion with Sqoop2 I'd
go with 3.0.
4) Does (should?) this affect the 1.5 release?