I wanted to start a conversation around moving to develop against Hive 3+
by default. (I describe this as Hive 3+ because it is close to Hive master,
which is well beyond any released Hive 3.) There has been considerable
development effort towards implementing features integrating Impala with
Hive 3+ and Hive ACID. This is currently developed under the
USE_CDP_HIVE=true configuration while regular development has continued
with Hive 2. The Hive 3+ development is now stable enough to be used for
regular development. It would be nice to reduce our test and compatibility
matrix and have a unified development environment.

Changing the major version of Hive is a breaking change, so it would
require an Impala 4.x code line. I have a specific proposal, but this is
mainly a frame for getting the discussion going.

I propose that we release Impala 3.4.0 and then update master to 4.0 and
allow breaking changes until the Impala 4.0 release. The main breaking
change would be to set USE_CDP_HIVE=true, enabling Hive 3+ development by
default. The Hive 2 configuration would be removed over time. Other
breaking changes can be proposed and voted on.

If there are developers interested in maintaining a 3.x branch, we can
create this branch and add appropriate support to any infrastructure (e.g.
bin/push_to_asf.py) to allow that.

Thoughts?

Thanks,

Joe McDonnell

Reply via email to