Ming Hsuan Tu created OOZIE-2456:
------------------------------------
Summary: spark action can not find pyspark module
Key: OOZIE-2456
URL: https://issues.apache.org/jira/browse/OOZIE-2456
Project: Oozie
Issue Type: Bug
Components: action, client, core
Affects Versions: 4.1.0
Environment: Ubuntu 14.04.3
Reporter: Ming Hsuan Tu
I hava a spark script written in pyspark and I want to submit it via oozie
spark action.
something like this:
{code:xml}
<action name="myapp">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
<master>local[*]</master>
<name>myapp</name>
<jar>${my_script}</jar>
<spark-opts>--executor-memory 4G --num-executors 4</spark-opts>
<arg>${arg1}</arg>
</spark>
<ok to="hive_import"/>
<error to="send_email"/>
</action>
{code}
The script imports pyspark module:
{code:text}
#!/usr/bin/spark-submit
from pyspark import SparkContext
from pyspark import SparkFiles
sc = SparkContext()
{code}
However, the oozie will throw the " Can not import pyspark module" exception.
This happens when I upgrade to CDH 5.5.1 from CDH 5.4.6.
The workaround would be using the shell action, but I think the spark action is
better to describe the spark task.
Any suggestion?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)