[
https://issues.apache.org/jira/browse/OOZIE-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Satish Subhashrao Saley updated OOZIE-3062:
-------------------------------------------
Description:
OOZIE-2569 created core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml
for spark action. But for spark to consider these files ([spark code
ref|https://github.com/apache/spark/blob/83fe3b5e10f8dc62245ea37143abb96be0f39805/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L284]),
we need to set HADOOP_CONF_DIR environmental variable, otherwise it will point
to default HADOOP_CONF_DIR configured on the node. Therefore, Oozie should set
HADOOP_CONF_DIR.
Due to a bug YARN-4727, yarn was not considering user specified value for
whiltelisted environment variables including HADOOP_CONF_DIR. So, this will
work on specified hadoop fix versions in YARN-4727.
This fix in Oozie will be no-op on Hadoop without YARN-4727 fix and Oozie will
behave as it is.
To verify this on one node machine.
1. In hadoop mapred-site.xml, set
{code}
<property>
<name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name>
<value>false</value>
</property>
{code}
2. Run any spark file copy example from Oozie. Check output directory, _SUCCESS
flag is missing. It should be there because we explicitly set
conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "true"); in
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L293
was:
OOZIE-2569 created core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml
for spark action. But for spark to consider these files ([spark code
ref|https://github.com/apache/spark/blob/83fe3b5e10f8dc62245ea37143abb96be0f39805/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L284]),
we need to set HADOOP_CONF_DIR environmental variable, otherwise it will point
to default HADOOP_CONF_DIR configured on the node. Therefore, Oozie should set
HADOOP_CONF_DIR.
Due to a bug YARN-4727, yarn was not considering user specified value for
whiltelisted environment variables including HADOOP_CONF_DIR. So, this will
work on specified hadoop fix versions in YARN-4727.
This fix in Oozie will be no-op on Hadoop without YARN-4727 fix and will behave
as it is.
To verify this on one node machine.
1. In hadoop mapred-site.xml, set
{code}
<property>
<name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name>
<value>false</value>
</property>
{code}
2. Run any spark file copy example from Oozie. Check output directory, _SUCCESS
flag is missing. It should be there because we explicitly set
conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "true"); in
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L293
> Set HADOOP_CONF_DIR for spark action
> ------------------------------------
>
> Key: OOZIE-3062
> URL: https://issues.apache.org/jira/browse/OOZIE-3062
> Project: Oozie
> Issue Type: Bug
> Reporter: Satish Subhashrao Saley
> Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-3062-1.patch
>
>
> OOZIE-2569 created core-site.xml, hdfs-site.xml, mapred-site.xml,
> yarn-site.xml for spark action. But for spark to consider these files ([spark
> code
> ref|https://github.com/apache/spark/blob/83fe3b5e10f8dc62245ea37143abb96be0f39805/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L284]),
> we need to set HADOOP_CONF_DIR environmental variable, otherwise it will
> point to default HADOOP_CONF_DIR configured on the node. Therefore, Oozie
> should set HADOOP_CONF_DIR.
> Due to a bug YARN-4727, yarn was not considering user specified value for
> whiltelisted environment variables including HADOOP_CONF_DIR. So, this will
> work on specified hadoop fix versions in YARN-4727.
> This fix in Oozie will be no-op on Hadoop without YARN-4727 fix and Oozie
> will behave as it is.
> To verify this on one node machine.
> 1. In hadoop mapred-site.xml, set
> {code}
> <property>
> <name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name>
> <value>false</value>
> </property>
> {code}
> 2. Run any spark file copy example from Oozie. Check output directory,
> _SUCCESS flag is missing. It should be there because we explicitly set
> conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "true"); in
> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L293
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)