Andrew Charneski created RANGER-2689:
----------------------------------------
Summary: Support multiple versions of Hive
Key: RANGER-2689
URL: https://issues.apache.org/jira/browse/RANGER-2689
Project: Ranger
Issue Type: Improvement
Components: plugins
Reporter: Andrew Charneski
Currently Ranger supports the latest version of Hive, 3.1.2. Unfortunately,
there are large segments of the big data community that relies on older
versions of Hive. Two major examples:
# Spark SQL uses a forked version of Hive 1.2.1
(https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html)
# EMR provides Hive only up to 2.3.5
(https://docs.aws.amazon.com/emr/latest/ReleaseGuide/Hive-release-history.html)
In order to support these internally, my organization has prepared two
modifications of Ranger to link against these versions. These are illustrated
in the PRs https://github.com/acharneski/ranger/pull/4 and
https://github.com/apache/ranger/pull/51
We would like to eliminate the need for entirely separate builds of Ranger to
support this, and integrate these variants into the main Ranger codebase. We
are willing to do the bulk of the implementation but would first like to
discuss the architecture of this change so as to build it in a way the Ranger
committers would be amenable to adopting.
My initial thought is to split the `hive-agent` module into something like
`hive-agent-base`, `hive-agent-1`, `hive-agent-2`, and `hive-agent-3`. This
would allow us to explicitly link to each major version of Hive while
minimizing the duplication of code. Thoughts?
Thank you!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)