[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

Ferdinand Xu (JIRA) Tue, 20 Sep 2016 18:15:41 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508335#comment-15508335
 ]


Ferdinand Xu commented on HIVE-14240:
-------------------------------------

Thanks [~stakiar] for your input. 
AFAIK, TestSparkCliDriver needs SparkSubmit to submit a job which requires 
SPARK_HOME to direct to a Spark distribution because it tests SparkOnYarn. 
[~kellyzly] [~mohitsabharwal], please correct it if any  following statements 
are wrong. In Pig, they don't require Spark distribution since they only test 
Spark standalone mode in their integration test.


> HoS itests shouldn't depend on a Spark distribution
> ---------------------------------------------------
>
>                 Key: HIVE-14240
>                 URL: https://issues.apache.org/jira/browse/HIVE-14240
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 2.0.0, 2.1.0, 2.0.1
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball) 
> from CloudFront. It uses this distribution to run Spark locally. It runs a 
> few tests with Spark in embedded mode, and some tests against a local Spark 
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download 
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its 
> dependencies, including Hadoop dependencies. This can cause problems when 
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every 
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published 
> in Maven Central. It will require some effort to get this working. The 
> current Hive Spark Client uses a launch script in the Spark installation to 
> run Spark jobs. The script basically does some setup work and invokes 
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class 
> directly, which avoids the need to have a full Spark distribution available 
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark. 
> For example, Hive and Spark require different versions of Kyro. One solution 
> to this would be to take Spark artifacts and shade Kyro inside them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

Reply via email to