[jira] [Commented] (OOZIE-2277) Honor oozie.action.sharelib.for.spark in Spark jobs

Marcelo Vanzin (JIRA) Tue, 11 Aug 2015 14:57:06 -0700

    [ 
https://issues.apache.org/jira/browse/OOZIE-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692325#comment-14692325
 ]


Marcelo Vanzin commented on OOZIE-2277:
---------------------------------------

[~rkanter] the Jackson error is because of the way we build Spark in CDH. The 
Jackson libs need to be in the system classpath, and you can only control that 
using the extraClassPath options (or with {{SPARK_DIST_CLASSPATH}}).

Spark is normally packaged with all dependencies, so you wouldn't run into that 
I suppose. (Well, in 1.4 there's a version that omits certain dependencies 
provided by Hadoop libraries, but I think Jackson is still there.)

{{--jars}} does not modify the system classpath and that's why you can't use it 
to work around the jackson issue. If the stuff you're adding is 
application-specific and not a Spark dependency, {{--jars}} is probably a 
better bet (also because, as Tom says, it distributes files automatically for 
you).

> Honor oozie.action.sharelib.for.spark in Spark jobs
> ---------------------------------------------------
>
>                 Key: OOZIE-2277
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2277
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Ryan Brush
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-2277.001.patch
>
>
> Shared libraries specified by oozie.action.sharelib.for.spark are not visible 
> in the Spark job itself. For instance, setting 
> oozie.action.sharelib.for.spark to "spark,hcat" will not make the hcat jars 
> usable in the Spark job. This is inconsistent with other actions (such as 
> Java and MapReduce actions).
> Since the Spark action just calls SparkSubmit, it looks like we would need to 
> explicitly pass the jars for the specified sharelibs into the SparkSubmit 
> operation so they are available to the Spark operation itself. 
> One option: we can just pass the HDFS URLs to that command via the --jars 
> parameter. This is actually what I've done to work around this issue; it 
> makes for a long SparkSubmit command but works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OOZIE-2277) Honor oozie.action.sharelib.for.spark in Spark jobs

Reply via email to