Breandán Mac Parland created OOZIE-2479:
-------------------------------------------

             Summary: SparkContext Not Using Yarn Config
                 Key: OOZIE-2479
                 URL: https://issues.apache.org/jira/browse/OOZIE-2479
             Project: Oozie
          Issue Type: Bug
          Components: workflow
    Affects Versions: 4.2.0
         Environment: Oozie 4.2.0.2.3.4.0-3485
Spark 1.4.1
Scala 2.10.5
HDP 2.3

            Reporter: Breandán Mac Parland


The spark action does not appear to use the jobTracker setting  in 
job.properties (or in the yarn config) when creating the SparkContext. When 
jobTracker property is set to use  myDomain:8050 (to match the 
yarn.resourcemanager.address setting), I can see in the oozie UI (click on job 
> action > action configuration) that myDomain:8050 is being submitted but when 
I drill down into the hadoop job history logs I see the error indicating that a 
default 0.0.0.0:8032 is being used:

*job.properties*
{code}
nameNode=hdfs://myDomain:8020
jobTracker=myOtherDomain:8050
queueName=default
master=yarn # have also tried yarn-cluster and yarn-client
 
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/bmp/
oozie.action.sharelib.for.spark=spark2 # I've added the updated spark libs I 
need in here
{code}
 
*workflow*
{code}
<workflow-app xmlns='uri:oozie:workflow:0.5' name='MyWorkflow'>
    <start to='spark-node' />
    <action name='spark-node'>
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/bmp/output"/>
            </prepare>
            <master>${master}</master>
            <name>My Workflow</name>
            <class>uk.co.bmp.drivers.MyDriver</class>
            <jar>${nameNode}/bmp/lib/bmp.spark-assembly-1.0.jar</jar>
            <spark-opts>--conf 
spark.yarn.historyServer.address=http://myDomain:18088 --conf 
spark.eventLog.dir=hdfs://myDomain/user/spark/applicationHistory --conf 
spark.eventLog.enabled=true</spark-opts>
            <arg>${nameNode}/bmp/input/input_file.csv</arg>
        </spark>
        <ok to="end" />
        <error to="fail" />
    </action>
    <kill name="fail">
        <message>Workflow failed, error
            message[${wf:errorMessage(wf:lastErrorNode())}]
        </message>
    </kill>
    <end name='end' />
</workflow-app>
{code}

*Error*
{code}
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], 
main() threw exception,Call From myDomain/ipAddress to 0.0.0.0:8032 failed on 
connection exception: java.net.ConnectException: Connection refused. For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused
...
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
...
{code}

Where is it pulling 8032 from? Why does it not use the port configured in the 
job.properties?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to