[ 
https://issues.apache.org/jira/browse/OOZIE-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617277#comment-16617277
 ] 

Clemens Valiente commented on OOZIE-2536:
-----------------------------------------

[~andras.piros] It's a pretty rare race condition that happens once or twice 
out of hundreds of sqoop jobs and we're not running oozie5.0 in production so 
it's hard to reproduce for me. Looking at the oozie 5.0.0 SqoopMain the 
sqoop-site.xml is still being created inside the LauncherMain so the problem 
still seems to be there - the file is created after the LocalContainerLauncher 
has been initialized so it might get deleted first.

 

the previous fix by adding propagation-conf.xml to the 
OozieLauncherOutputCommitter might be the incorrect approach here since 
sqoop-site.xml is only needed by sqoop actions and shouldn't be created for 
every job action. I am worried the correct course of action would be to have a 
specific OutputCommitter for each LauncherMain class creating the files that 
are needed later...

> Hadoop's cleanup of local directory in uber mode causing failures
> -----------------------------------------------------------------
>
>                 Key: OOZIE-2536
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2536
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Blocker
>             Fix For: 4.3.0
>
>         Attachments: OOZIE-2536-1.patch
>
>
> In out environment, we faced an issue where uberized Shell action was getting 
> stuck even though the shell action got completed with status 0. Please refer 
> the attached syslog and stdout if launcher job, here I point out partially
> stdout :
> {quote}
> >>> Invoking Shell command line now >>
> Stdoutput myshellType=qmyshellUpdate
> Exit code of the Shell command 0
> <<< Invocation of Shell command completed <<<
> <<< Invocation of Main class completed <<<
> {quote} 
> syslog
> {quote}
> 2016-05-23 11:15:52,587 WARN [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Unable to delete unexpected 
> local file/dir .action.xml.crc: insufficient permissions?
> 2016-05-23 11:15:52,588 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.conf.Configuration: error parsing conf propagation-conf.xml
> java.io.FileNotFoundException: 
> /tmp/yarn-local/usercache/saley/appcache/application_1234_123/container_e01_1234_123_01_000001/propagation-conf.xml
>  (No such file or directory)
>     at java.io.FileInputStream.open0(Native Method)
>     at java.io.FileInputStream.open(FileInputStream.java:195)
>     at java.io.FileInputStream.<init>(FileInputStream.java:138)
>     at java.io.FileInputStream.<init>(FileInputStream.java:93)
>     at 
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>     at 
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>     at java.net.URL.openStream(URL.java:1038)
>     at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>     at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>     at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>     at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>     at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
>     at 
> org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
>     at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.getMemoryRequired(TaskAttemptImpl.java:568)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.updateMillisCounters(TaskAttemptImpl.java:1295)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createJobCounterUpdateEventTASucceeded(TaskAttemptImpl.java:1323)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.access$3500(TaskAttemptImpl.java:147)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1710)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1701)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1085)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>     at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1394)
>     at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1386)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>     at java.lang.Thread.run(Thread.java:745)
> 2016-05-23 11:15:52,590 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /grid/5/tmp/yarn-local/usercache/saley/appcache/application_1234_123/container_e01_1234_123_01_000001/propagation-conf.xml
>  (No such file or directory)
>     at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2639)
>     at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>     at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>     at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
>     at 
> org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
>     at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1251)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.getMemoryRequired(TaskAttemptImpl.java:568)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.updateMillisCounters(TaskAttemptImpl.java:1295)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createJobCounterUpdateEventTASucceeded(TaskAttemptImpl.java:1323)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.access$3500(TaskAttemptImpl.java:147)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1710)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$SucceededTransition.transition(TaskAttemptImpl.java:1701)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1085)
>     at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>     at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1394)
>     at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1386)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: 
> /tmp/yarn-local/usercache/saley/appcache/application_1234_123/container_e01_1234_123_01_000001/propagation-conf.xml
>  (No such file or directory)
>     at java.io.FileInputStream.open0(Native Method)
>     at java.io.FileInputStream.open(FileInputStream.java:195)
>     at java.io.FileInputStream.<init>(FileInputStream.java:138)
>     at java.io.FileInputStream.<init>(FileInputStream.java:93)
>     at 
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>     at 
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>     at java.net.URL.openStream(URL.java:1038)
>     at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>     at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>     ... 22 more
> 2016-05-23 11:15:52,591 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> 2016-05-23 11:15:52,591 ERROR [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[AsyncDispatcher ShutDown handler,5,main] threw an Exception.
> java.lang.SecurityException: Intercepted System.exit(-1)
>     at 
> org.apache.oozie.action.hadoop.LauncherSecurityManager.checkExit(LauncherMapper.java:637)
>     at java.lang.Runtime.exit(Runtime.java:107)
>     at java.lang.System.exit(System.java:971)
>     at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$2.run(AsyncDispatcher.java:294)
>     at java.lang.Thread.run(Thread.java:745)
> 2016-05-23 11:16:44,589 WARN [LeaseRenewer:sa...@namenode.com:8020] 
> org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final 
> parameter: hadoop.tmp.dir;  Ignoring.
> 2016-05-23 11:20:53,677 INFO [Socket Reader #2 for port 50500] 
> SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for saley 
> (auth:SIMPLE)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to