[ 
https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317486#comment-15317486
 ] 

Satish Subhashrao Saley commented on OOZIE-2547:
------------------------------------------------

Hello [~rkanter], Could you please review the patch?
I have removed the logic behind populating {{spark.executor.extraClassPath}}, 
{{spark.driver.extraClassPath}}, {{-- jars}} and {{spark.yarn.dist.files}}. 
Instead of that, now we are adding distributed cached files in {{-- files}}. 
While doing so, I also make sure that hdfs paths to those files are formulated 
such that spark won't make another copy. 

I have tested the patch locally as well as in clusters, it seems working fine 
with {{-- master}} as local,yarn-client and yarn-cluster. 

> Add mapreduce.job.cache.files to spark action
> ---------------------------------------------
>
>                 Key: OOZIE-2547
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2547
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Minor
>         Attachments: OOZIE-2547-1.patch
>
>
> Currently, we pass jars using --jars option while submitting spark job. Also, 
> we add spark.yarn.dist.files option in case of yarn-client mode. 
> Instead of that, we can have only --files option and pass on the files which 
> are present in mapreduce.job.cache.files. While doing so, we make sure that 
> spark won't make another copy of the files if files exist on the hdfs. We saw 
> the issues when files are getting copied multiple times and causing 
> exceptions such as :
> {code}
> Diagnostics: Resource 
> hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar
>  changed on src filesystem
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to