I wanted to start a conversation around moving to develop against Hive 3+ by default. (I describe this as Hive 3+ because it is close to Hive master, which is well beyond any released Hive 3.) There has been considerable development effort towards implementing features integrating Impala with Hive 3+ and Hive ACID. This is currently developed under the USE_CDP_HIVE=true configuration while regular development has continued with Hive 2. The Hive 3+ development is now stable enough to be used for regular development. It would be nice to reduce our test and compatibility matrix and have a unified development environment.
Changing the major version of Hive is a breaking change, so it would require an Impala 4.x code line. I have a specific proposal, but this is mainly a frame for getting the discussion going. I propose that we release Impala 3.4.0 and then update master to 4.0 and allow breaking changes until the Impala 4.0 release. The main breaking change would be to set USE_CDP_HIVE=true, enabling Hive 3+ development by default. The Hive 2 configuration would be removed over time. Other breaking changes can be proposed and voted on. If there are developers interested in maintaining a 3.x branch, we can create this branch and add appropriate support to any infrastructure (e.g. bin/push_to_asf.py) to allow that. Thoughts? Thanks, Joe McDonnell