[jira] [Commented] (SPARK-6906) Improve Hive integration support

Michael Armbrust (JIRA) Tue, 25 Aug 2015 18:44:35 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712336#comment-14712336
 ]


Michael Armbrust commented on SPARK-6906:
-----------------------------------------

Hey Tom, Sorry for the confusion.  The docs are updated here: 
https://github.com/apache/spark/pull/8441

We only ever compile spark against Hive 1.2.1 now.  You do need hadoop jars too 
since hive metastore operations often check file permissions and do other 
validation.  We could in theory use the ones from the assembly jar, but this is 
complicated by the fact that much of hive actually lives in the 
{{org.apache.hadoop}} namespace.  So, for simplicity we don't share any of 
those classes across the barrier.

> Improve Hive integration support
> --------------------------------
>
>                 Key: SPARK-6906
>                 URL: https://issues.apache.org/jira/browse/SPARK-6906
>             Project: Spark
>          Issue Type: Story
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Michael Armbrust
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> Right now Spark SQL is very coupled to a specific version of Hive for two 
> primary reasons.
>  - Metadata: we use the Hive Metastore client to retrieve information about 
> tables in a metastore.
>  - Execution: UDFs, UDAFs, SerDes, HiveConf and various helper functions for 
> configuration.
> Since Hive is generally not compatible across versions, we are currently 
> maintain fairly expensive shim layers to let us talk to both Hive 12 and Hive 
> 13 metastores.  Ideally we would be able to talk to more versions of Hive 
> with less maintenance burden.
> This task is proposing that we separate the hive version that is used for 
> communicating with the metastore from the version that is used for execution. 
>  In doing so we can significantly reduce the size of the shim by only 
> providing compatibility for metadata operations.  All execution will be done 
> with single version of Hive (the newest version that is supported by Spark 
> SQL).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-6906) Improve Hive integration support

Reply via email to