[
https://issues.apache.org/jira/browse/SPARK-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078891#comment-14078891
]
Brock Noland commented on SPARK-2741:
-------------------------------------
Yes, after looking into it more, to include hive in the assembly you have to
use the hive profile. However it appears the default Hadoop 2 tarball was built
with the Hive profile.
My vision is that users can use Hive + Spark with relatively little effort,
that is not having to build Spark themselves and then deploy a custom build.
Here are four options:
1) Don't include Hive in the spark assembly, require users to get their version
of Hive.
2) Don't include Hive in the spark assembly, instead include it in say
opt/hive-0.12. Then users could enable/disable some flag to include Hive in the
CP.
3) Have a separate lib directory without hive, say "opt/lib-without-hive" and
then document that users who want to use their own version of Hive execute the
following command:
{noformat}
mv lib opt/lib-with-hive && mv opt/lib-without-hive lib
{noformat}
4) Shade hive within the spark assembly
I'd vote for 2 or 3 since it's not disruptive to existing users and technically
dubious like 4.
> Publish version of spark assembly which does not contain Hive
> -------------------------------------------------------------
>
> Key: SPARK-2741
> URL: https://issues.apache.org/jira/browse/SPARK-2741
> Project: Spark
> Issue Type: Task
> Reporter: Brock Noland
>
> The current spark assembly contains Hive. This conflicts with Hive + Spark
> which is attempting to use it's own version of Hive.
> We'll need to publish a version of the assembly which does not contain the
> Hive jars.
--
This message was sent by Atlassian JIRA
(v6.2#6252)