[
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024062#comment-17024062
]
Dongjoon Hyun edited comment on SPARK-30643 at 1/27/20 2:28 AM:
----------------------------------------------------------------
It sounds like a misunderstanding on the role of embedded Hive. It's just used
to talk Hive metastore.
{quote}But if I chose to run Hive 3 and Spark with embedded Hive 2.3, then
SparkSQL and Hive queries behavior could differ in some cases.
{quote}
Everything (SQL Parser/Analyzer/Optimizer and execution engine) are Spark's own
code. So, in general, the embedded Hive 1.2/2.3 doesn't make a different. The
exceptional cases might be Hive bugs. For example, Spark 3.0.0 will ship with
Hive 1.2 and Hive 2.3 (default), and all UTs passed in both environment with
same results.
For the following, I don't think Apache Spark need to have Hive 1.2 and Hive
2.3 and 3.1 in Apache Spark 3.x era. Adding 2.3 took away too many efforts from
Apache Spark community, so it couldn't happen in Apache Spark 2.x. Maybe, we
can consider that for Apache Spark 4.0 if there is many users who running Hive
3.x in the production stably (not beta.)
{quote}I think that majority of reasons that went into support of embedding
Hive 2.3 will apply to support of embedding Hive 3.
{quote}
was (Author: dongjoon):
It sounds like a misunderstanding on the role of embedded Hive. It's just used
to talk Hive metastore.
> But if I chose to run Hive 3 and Spark with embedded Hive 2.3, then SparkSQL
> and Hive queries behavior could differ in some cases.
Everything (SQL Parser/Analyzer/Optimizer and execution engine) are Spark's own
code. So, in general, the embedded Hive 1.2/2.3 doesn't make a different. The
exceptional cases might be Hive bugs. For example, Spark 3.0.0 will ship with
Hive 1.2 and Hive 2.3 (default), and all UTs passed in both environment with
same results.
I don't think Apache Spark need to have Hive 1.2 and Hive 2.3 and 3.1 in Apache
Spark 3.x era. Adding 2.3 took away too many efforts from Apache Spark
community, so it couldn't happen in Apache Spark 2.x. Maybe, we can consider
that for Apache Spark 4.0 if there is many users who running Hive 3.x in the
production stably (not beta.)
> I think that majority of reasons that went into support of embedding Hive 2.3
> will apply to support of embedding Hive 3.
> Add support for embedding Hive 3
> --------------------------------
>
> Key: SPARK-30643
> URL: https://issues.apache.org/jira/browse/SPARK-30643
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Igor Dvorzhak
> Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3,
> compilation fails against Hive 3.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]