[
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508403#comment-15508403
]
Rui Li commented on HIVE-14240:
-------------------------------
We have two kinds of test for HoS - TestSparkCliDriver runs on local-cluster,
and TestMiniSparkOnYarnCliDriver runs on a mini yarn cluster. I know
local-cluster is not intended to be used outside spark. So if local-cluster
causes trouble for this task, I think it's acceptable to migrate the qtest in
TestSparkCliDriver to TestMiniSparkOnYarnCliDriver.
> HoS itests shouldn't depend on a Spark distribution
> ---------------------------------------------------
>
> Key: HIVE-14240
> URL: https://issues.apache.org/jira/browse/HIVE-14240
> Project: Hive
> Issue Type: Improvement
> Components: Spark
> Affects Versions: 2.0.0, 2.1.0, 2.0.1
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball)
> from CloudFront. It uses this distribution to run Spark locally. It runs a
> few tests with Spark in embedded mode, and some tests against a local Spark
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its
> dependencies, including Hadoop dependencies. This can cause problems when
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published
> in Maven Central. It will require some effort to get this working. The
> current Hive Spark Client uses a launch script in the Spark installation to
> run Spark jobs. The script basically does some setup work and invokes
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class
> directly, which avoids the need to have a full Spark distribution available
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark.
> For example, Hive and Spark require different versions of Kyro. One solution
> to this would be to take Spark artifacts and shade Kyro inside them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)