[jira] [Comment Edited] (SPARK-30643) Add support for embedding Hive 3

Dongjoon Hyun (Jira) Sun, 26 Jan 2020 18:29:25 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024062#comment-17024062
 ]


Dongjoon Hyun edited comment on SPARK-30643 at 1/27/20 2:28 AM:
----------------------------------------------------------------

It sounds like a misunderstanding on the role of embedded Hive. It's just used 
to talk Hive metastore.
{quote}But if I chose to run Hive 3 and Spark with embedded Hive 2.3, then 
SparkSQL and Hive queries behavior could differ in some cases.
{quote}
Everything (SQL Parser/Analyzer/Optimizer and execution engine) are Spark's own 
code. So, in general, the embedded Hive 1.2/2.3 doesn't make a different. The 
exceptional cases might be Hive bugs. For example, Spark 3.0.0 will ship with 
Hive 1.2 and Hive 2.3 (default), and all UTs passed in both environment with 
same results.

For the following, I don't think Apache Spark need to have Hive 1.2 and Hive 
2.3 and 3.1 in Apache Spark 3.x era. Adding 2.3 took away too many efforts from 
Apache Spark community, so it couldn't happen in Apache Spark 2.x. Maybe, we 
can consider that for Apache Spark 4.0 if there is many users who running Hive 
3.x in the production stably (not beta.)
{quote}I think that majority of reasons that went into support of embedding 
Hive 2.3 will apply to support of embedding Hive 3.
{quote}


was (Author: dongjoon):
It sounds like a misunderstanding on the role of embedded Hive. It's just used 
to talk Hive metastore.
> But if I chose to run Hive 3 and Spark with embedded Hive 2.3, then SparkSQL 
> and Hive queries behavior could differ in some cases.

Everything (SQL Parser/Analyzer/Optimizer and execution engine) are Spark's own 
code. So, in general, the embedded Hive 1.2/2.3 doesn't make a different. The 
exceptional cases might be Hive bugs. For example, Spark 3.0.0 will ship with 
Hive 1.2 and Hive 2.3 (default), and all UTs passed in both environment with 
same results.

I don't think Apache Spark need to have Hive 1.2 and Hive 2.3 and 3.1 in Apache 
Spark 3.x era. Adding 2.3 took away too many efforts from Apache Spark 
community, so it couldn't happen in Apache Spark 2.x. Maybe, we can consider 
that for Apache Spark 4.0 if there is many users who running Hive 3.x in the 
production stably (not beta.)
> I think that majority of reasons that went into support of embedding Hive 2.3 
> will apply to support of embedding Hive 3.


> Add support for embedding Hive 3
> --------------------------------
>
>                 Key: SPARK-30643
>                 URL: https://issues.apache.org/jira/browse/SPARK-30643
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Igor Dvorzhak
>            Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
> compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-30643) Add support for embedding Hive 3

Reply via email to