[
https://issues.apache.org/jira/browse/OOZIE-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter resolved OOZIE-2456.
----------------------------------
Resolution: Duplicate
> spark action can not find pyspark module
> ----------------------------------------
>
> Key: OOZIE-2456
> URL: https://issues.apache.org/jira/browse/OOZIE-2456
> Project: Oozie
> Issue Type: Bug
> Components: action, client, core
> Affects Versions: 4.1.0
> Environment: Ubuntu 14.04.3
> Reporter: Ming Hsuan Tu
>
> I hava a spark script written in pyspark and I want to submit it via oozie
> spark action.
> something like this:
> {code:xml}
> <action name="myapp">
> <spark xmlns="uri:oozie:spark-action:0.1">
> <job-tracker>${job_tracker}</job-tracker>
> <name-node>${name_node}</name-node>
> <master>local[*]</master>
> <name>myapp</name>
> <jar>${my_script}</jar>
> <spark-opts>--executor-memory 4G --num-executors 4</spark-opts>
> <arg>${arg1}</arg>
> </spark>
> <ok to="hive_import"/>
> <error to="send_email"/>
> </action>
> {code}
> The script imports pyspark module:
> {code:text}
> #!/usr/bin/spark-submit
> from pyspark import SparkContext
> from pyspark import SparkFiles
> sc = SparkContext()
> {code}
> However, the oozie will throw the " Can not import pyspark module" exception.
> This happens when I upgrade to CDH 5.5.1 from CDH 5.4.6.
> The workaround would be using the shell action, but I think the spark action
> is better to describe the spark task.
> Any suggestion?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)