[ 
https://issues.apache.org/jira/browse/SPARK-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078891#comment-14078891
 ] 

Brock Noland commented on SPARK-2741:
-------------------------------------

Yes, after looking into it more, to include hive in the assembly you have to 
use the hive profile. However it appears the default Hadoop 2 tarball was built 
with the Hive profile.

My vision is that users can use Hive + Spark with relatively little effort, 
that is not having to build Spark themselves and then deploy a custom build. 
Here are four options:

1) Don't include Hive in the spark assembly, require users to get their version 
of Hive.

2) Don't include Hive in the spark assembly, instead include it in say 
opt/hive-0.12. Then users could enable/disable some flag to include Hive in the 
CP.

3) Have a separate lib directory without hive, say "opt/lib-without-hive" and 
then document that users who want to use their own version of Hive execute the 
following command:

{noformat}
mv lib opt/lib-with-hive && mv opt/lib-without-hive lib
{noformat}

4) Shade hive within the spark assembly

I'd vote for 2 or 3 since it's not disruptive to existing users and technically 
dubious like 4.

> Publish version of spark assembly which does not contain Hive
> -------------------------------------------------------------
>
>                 Key: SPARK-2741
>                 URL: https://issues.apache.org/jira/browse/SPARK-2741
>             Project: Spark
>          Issue Type: Task
>            Reporter: Brock Noland
>
> The current spark assembly contains Hive. This conflicts with Hive + Spark 
> which is attempting to use it's own version of Hive.
> We'll need to publish a version of the assembly which does not contain the 
> Hive jars.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to