[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

Sahil Takiar (JIRA) Wed, 21 Sep 2016 10:41:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510628#comment-15510628
 ]


Sahil Takiar commented on HIVE-14240:
-------------------------------------

[~Ferd], [~lirui] yes I forgot that there are two ways qtests get run on spark, 
one is in local-cluster mode and the other is in yarn-client mode. I believe 
the dependency on a SPARK_HOME directory is present in both modes. So unless we 
can figure out a way to change this in Spark, I think we still need the 
dependency on the Spark distribution.

> HoS itests shouldn't depend on a Spark distribution
> ---------------------------------------------------
>
>                 Key: HIVE-14240
>                 URL: https://issues.apache.org/jira/browse/HIVE-14240
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 2.0.0, 2.1.0, 2.0.1
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball) 
> from CloudFront. It uses this distribution to run Spark locally. It runs a 
> few tests with Spark in embedded mode, and some tests against a local Spark 
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download 
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its 
> dependencies, including Hadoop dependencies. This can cause problems when 
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every 
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published 
> in Maven Central. It will require some effort to get this working. The 
> current Hive Spark Client uses a launch script in the Spark installation to 
> run Spark jobs. The script basically does some setup work and invokes 
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class 
> directly, which avoids the need to have a full Spark distribution available 
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark. 
> For example, Hive and Spark require different versions of Kyro. One solution 
> to this would be to take Spark artifacts and shade Kyro inside them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

Reply via email to