[jira] [Commented] (SPARK-6906) Improve Hive integration support

Thomas Graves (JIRA) Tue, 25 Aug 2015 12:43:20 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711843#comment-14711843
 ]


Thomas Graves commented on SPARK-6906:
--------------------------------------

Can someone who did this work explain how this is now supposed to work?  I 
can't seem to build against hive 0.13.1 if I change the hive.version.  I'm 
assuming because of the 1.2.1.spark has some special modifications??

Does that now mean I have to use spark.sql.hive.metastore.jars to point to the 
jars and I can't bundle it in the assembly jar?  Do I still need to use -Phive? 
 

Does this also mean I have to send them along as jars when I run on YARN?

What all jars really need to be in the metastore.jars, docs say both hive and 
hadoop, but why hadoop if its bundled in assembly jar? Going by the name of the 
variable makes me think I should only have to point to the hive metastore jars 
not all hive jars or hadoop jars.

Both the build docs and the sql docs aren't updated from what I can tell.

> Improve Hive integration support
> --------------------------------
>
>                 Key: SPARK-6906
>                 URL: https://issues.apache.org/jira/browse/SPARK-6906
>             Project: Spark
>          Issue Type: Story
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Michael Armbrust
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> Right now Spark SQL is very coupled to a specific version of Hive for two 
> primary reasons.
>  - Metadata: we use the Hive Metastore client to retrieve information about 
> tables in a metastore.
>  - Execution: UDFs, UDAFs, SerDes, HiveConf and various helper functions for 
> configuration.
> Since Hive is generally not compatible across versions, we are currently 
> maintain fairly expensive shim layers to let us talk to both Hive 12 and Hive 
> 13 metastores.  Ideally we would be able to talk to more versions of Hive 
> with less maintenance burden.
> This task is proposing that we separate the hive version that is used for 
> communicating with the metastore from the version that is used for execution. 
>  In doing so we can significantly reduce the size of the shim by only 
> providing compatibility for metadata operations.  All execution will be done 
> with single version of Hive (the newest version that is supported by Spark 
> SQL).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-6906) Improve Hive integration support

Reply via email to