Proposal for Sqoop 1.5 We have Sqoop 1.4.x going on which is the production version of Sqoop, with support for ancient versions for Hadoop (from 0.20), Hive 0.7+ and HBase 0.94 among others.
There is a good amount of interest in contribution to Sqoop 1 as it is the current production version. But Sqoop has a few issues that make Hadoop 1.x is causing issues in bringing new features easily into Sqoop 1.x (for example getting Phoenix changes into Sqoop and potentially others waiting in the wings) Also, we have been using Ant/Ivy based project, which is causing issues with component version management. We can potentially use a Maven profile based configuration to easily allow multiple component versions to have more flexibility in builds and packaging and how we publish artifacts To that end here is what I propose (had a brief discussion with Jarcec last week) in the order of priority Create a new Sqoop 1.5 branch where we 1. Deprecate support for Hadoop 1 and older versions of HBase (only support 1.0+) and Hive (only support 1.0+) 2. Mavenize the project 3. Clean up the package jumble in the code – only have org.apache.sqoop packages 4. Bring in all the new features that otherwise are difficult to bring in with older What should we do with 1.4.x branch? My initial thought is that we do a 1.4.7 release with what is available and have 1.5.x as the branch to make further changes. Thoughts? Thanks Venkat