[jira] [Updated] (OOZIE-2277) Honor oozie.action.sharelib.for.spark in Spark jobs

Robert Kanter (JIRA) Thu, 13 Aug 2015 16:44:09 -0700

     [ 
https://issues.apache.org/jira/browse/OOZIE-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Kanter updated OOZIE-2277:
---------------------------------
    Attachment: OOZIE-2277.002.patch

The 002 patch should be correct now.  With Marcelo and Hari's help, and lots of 
trial and error, I was able to figure out what configs we need to set to what 
values for local, yarn-client, and yarn-cluster modes.  I put a large comment 
in {{SparkMain}} explaining what needs to be done for each mode.  The patch 
also makes the {{SparkConfigurationService}} ignore the {{spark.yarn.jar}} 
property, as this can conflict with the Spark jars in the Sharelib, especially 
in yarn-client mode.  I also changed the Spark sharelib pom a bit to make sure 
everything we need is there.  The patch got a little more complicated, so I put 
it up on RB: https://reviews.apache.org/r/37452/

> Honor oozie.action.sharelib.for.spark in Spark jobs
> ---------------------------------------------------
>
>                 Key: OOZIE-2277
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2277
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Ryan Brush
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-2277.001.patch, OOZIE-2277.002.patch
>
>
> Shared libraries specified by oozie.action.sharelib.for.spark are not visible 
> in the Spark job itself. For instance, setting 
> oozie.action.sharelib.for.spark to "spark,hcat" will not make the hcat jars 
> usable in the Spark job. This is inconsistent with other actions (such as 
> Java and MapReduce actions).
> Since the Spark action just calls SparkSubmit, it looks like we would need to 
> explicitly pass the jars for the specified sharelibs into the SparkSubmit 
> operation so they are available to the Spark operation itself. 
> One option: we can just pass the HDFS URLs to that command via the --jars 
> parameter. This is actually what I've done to work around this issue; it 
> makes for a long SparkSubmit command but works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (OOZIE-2277) Honor oozie.action.sharelib.for.spark in Spark jobs

Reply via email to