Ming Hsuan Tu created OOZIE-2456:
------------------------------------

             Summary: spark action can not find pyspark module
                 Key: OOZIE-2456
                 URL: https://issues.apache.org/jira/browse/OOZIE-2456
             Project: Oozie
          Issue Type: Bug
          Components: action, client, core
    Affects Versions: 4.1.0
         Environment: Ubuntu 14.04.3
            Reporter: Ming Hsuan Tu


I hava a spark script written in pyspark and I want to submit it via oozie 
spark action.
something like this:

{code:xml}
  <action name="myapp">
      <spark xmlns="uri:oozie:spark-action:0.1">
          <job-tracker>${job_tracker}</job-tracker>
          <name-node>${name_node}</name-node>
          <master>local[*]</master>
          <name>myapp</name>
          <jar>${my_script}</jar>
          <spark-opts>--executor-memory 4G --num-executors 4</spark-opts>
          <arg>${arg1}</arg>
      </spark>
      <ok to="hive_import"/>
      <error to="send_email"/>
  </action>
{code}

The script imports pyspark module:

{code:text}
#!/usr/bin/spark-submit
from pyspark import SparkContext
from pyspark import SparkFiles
sc = SparkContext()
{code}

However, the oozie will throw the " Can not import pyspark module" exception.
This happens when I upgrade to CDH 5.5.1 from CDH 5.4.6.
The workaround would be using the shell action, but I think the spark action is 
better to describe the spark task.
Any suggestion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to