Hi, All. First of all, I want to put this as a policy issue instead of a technical issue. Also, this is orthogonal from `hadoop` version discussion.
Apache Spark community kept (not maintained) the forked Apache Hive 1.2.1 because there has been no other options before. As we see at SPARK-20202, it's not a desirable situation among the Apache projects. https://issues.apache.org/jira/browse/SPARK-20202 Also, please note that we `kept`, not `maintained`, because we know it's not good. There are several attempt to update that forked repository for several reasons (Hadoop 3 support is one of the example), but those attempts are also turned down. >From Apache Spark 3.0, it seems that we have a new feasible option `hive-2.3` profile. What about moving forward in this direction further? For example, can we remove the usage of forked `hive` in Apache Spark 3.0 completely officially? If someone still needs to use the forked `hive`, we can have a profile `hive-1.2`. Of course, it should not be a default profile in the community. I want to say this is a goal we should achieve someday. If we don't do anything, nothing happen. At least we need to prepare this. Without any preparation, Spark 3.1+ will be the same. Shall we focus on what are our problems with Hive 2.3.6? If the only reason is that we didn't use it before, we can release another `3.0.0-preview` for that. Bests, Dongjoon.