Breandán Mac Parland created OOZIE-2479:
-------------------------------------------
Summary: SparkContext Not Using Yarn Config
Key: OOZIE-2479
URL: https://issues.apache.org/jira/browse/OOZIE-2479
Project: Oozie
Issue Type: Bug
Components: workflow
Affects Versions: 4.2.0
Environment: Oozie 4.2.0.2.3.4.0-3485
Spark 1.4.1
Scala 2.10.5
HDP 2.3
Reporter: Breandán Mac Parland
The spark action does not appear to use the jobTracker setting in
job.properties (or in the yarn config) when creating the SparkContext. When
jobTracker property is set to use myDomain:8050 (to match the
yarn.resourcemanager.address setting), I can see in the oozie UI (click on job
> action > action configuration) that myDomain:8050 is being submitted but when
I drill down into the hadoop job history logs I see the error indicating that a
default 0.0.0.0:8032 is being used:
*job.properties*
{code}
nameNode=hdfs://myDomain:8020
jobTracker=myOtherDomain:8050
queueName=default
master=yarn # have also tried yarn-cluster and yarn-client
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/bmp/
oozie.action.sharelib.for.spark=spark2 # I've added the updated spark libs I
need in here
{code}
*workflow*
{code}
<workflow-app xmlns='uri:oozie:workflow:0.5' name='MyWorkflow'>
<start to='spark-node' />
<action name='spark-node'>
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/bmp/output"/>
</prepare>
<master>${master}</master>
<name>My Workflow</name>
<class>uk.co.bmp.drivers.MyDriver</class>
<jar>${nameNode}/bmp/lib/bmp.spark-assembly-1.0.jar</jar>
<spark-opts>--conf
spark.yarn.historyServer.address=http://myDomain:18088 --conf
spark.eventLog.dir=hdfs://myDomain/user/spark/applicationHistory --conf
spark.eventLog.enabled=true</spark-opts>
<arg>${nameNode}/bmp/input/input_file.csv</arg>
</spark>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
</workflow-app>
{code}
*Error*
{code}
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain],
main() threw exception,Call From myDomain/ipAddress to 0.0.0.0:8032 failed on
connection exception: java.net.ConnectException: Connection refused. For more
details see: http://wiki.apache.org/hadoop/ConnectionRefused
...
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
...
{code}
Where is it pulling 8032 from? Why does it not use the port configured in the
job.properties?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)