[
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510628#comment-15510628
]
Sahil Takiar commented on HIVE-14240:
-------------------------------------
[~Ferd], [~lirui] yes I forgot that there are two ways qtests get run on spark,
one is in local-cluster mode and the other is in yarn-client mode. I believe
the dependency on a SPARK_HOME directory is present in both modes. So unless we
can figure out a way to change this in Spark, I think we still need the
dependency on the Spark distribution.
> HoS itests shouldn't depend on a Spark distribution
> ---------------------------------------------------
>
> Key: HIVE-14240
> URL: https://issues.apache.org/jira/browse/HIVE-14240
> Project: Hive
> Issue Type: Improvement
> Components: Spark
> Affects Versions: 2.0.0, 2.1.0, 2.0.1
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball)
> from CloudFront. It uses this distribution to run Spark locally. It runs a
> few tests with Spark in embedded mode, and some tests against a local Spark
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its
> dependencies, including Hadoop dependencies. This can cause problems when
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published
> in Maven Central. It will require some effort to get this working. The
> current Hive Spark Client uses a launch script in the Spark installation to
> run Spark jobs. The script basically does some setup work and invokes
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class
> directly, which avoids the need to have a full Spark distribution available
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark.
> For example, Hive and Spark require different versions of Kyro. One solution
> to this would be to take Spark artifacts and shade Kyro inside them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)