Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Dongjoon Hyun Mon, 18 Nov 2019 21:11:50 -0800

Hi, All.

First of all, I want to put this as a policy issue instead of a technical
issue.
Also, this is orthogonal from `hadoop` version discussion.


Apache Spark community kept (not maintained) the forked Apache Hive
1.2.1 because there has been no other options before. As we see at
SPARK-20202, it's not a desirable situation among the Apache projects.

    https://issues.apache.org/jira/browse/SPARK-20202

Also, please note that we `kept`, not `maintained`, because we know it's
not good.
There are several attempt to update that forked repository
for several reasons (Hadoop 3 support is one of the example),
but those attempts are also turned down.

>From Apache Spark 3.0, it seems that we have a new feasible option
`hive-2.3` profile. What about moving forward in this direction further?

For example, can we remove the usage of forked `hive` in Apache Spark 3.0
completely officially? If someone still needs to use the forked `hive`, we
can
have a profile `hive-1.2`. Of course, it should not be a default profile in
the community.

I want to say this is a goal we should achieve someday.
If we don't do anything, nothing happen. At least we need to prepare this.
Without any preparation, Spark 3.1+ will be the same.

Shall we focus on what are our problems with Hive 2.3.6?
If the only reason is that we didn't use it before, we can release another
`3.0.0-preview` for that.

Bests,
Dongjoon.

Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Reply via email to