[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473030#comment-13473030
 ] 

Li Ming commented on MAPREDUCE-3655:
------------------------------------

This is also happens on 2.0.1-alpha, it seems related to the resource 
localization. In the DistributedShell example, the ContainerLaunchContext of AM 
has LocalResources which are the AppMaster.jar, but other task containers do 
not have this. And only the container with local resources will create the 
directory like 
/tmp/nm-local-dir/usercache/jiangbing/appcache/application_1325062142731_0006, 
so the non-AM containers will fail to use these directories.
                
> Exception from launching allocated container
> --------------------------------------------
>
>                 Key: MAPREDUCE-3655
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3655
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 0.23.0
>            Reporter: Bing Jiang
>
> I use Hadoop-Yarn to deploy my real-time distributed computation system, and 
> I get reply from mapreduce-u...@hadoop.apache.org to follow these guilders 
> below:
>          
> http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html
>          
> http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
> When I follow the steps to construct my Client, ApplicationMaster. And an 
> issue occurs to me that  NM fail to launch a Container because of  
> java.io.FileNotFoundException.
> So the part of NM log  has been attached below:
>  ....
> 2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.
> nodemanager.containermanager.application.Application: Adding 
> container_1325062142731_0006_01_000001 to application 
> application_1325062142731_0006
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ApplicationLocalizationEvent.EventType:
>  INIT_APPLICATION_RESOURCES
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationInitedEvent.EventType:
>  APPLICATION_INITED
> 2011-12-29 15:49:16,250 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Processing application_1325062142731_0006 of type APPLICATION_INITED
> 2011-12-29 15:49:16,250 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1325062142731_0006 transitioned from INITING to 
> RUNNING
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerAppStartedEvent.EventType:
>  APPLICATION_STARTED
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerInitEvent.EventType:
>  INIT_CONTAINER
> 2011-12-29 15:49:16,250 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Processing container_1325062142731_0006_01_000001 of type INIT_CONTAINER
> 2011-12-29 15:49:16,250 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1325062142731_0006_01_000001 transitioned from NEW to 
> LOCALIZED
> 2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:
>  LAUNCH_CONTAINER
> 2011-12-29 15:49:16,287 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEvent.EventType:
>  CONTAINER_LAUNCHED
> 2011-12-29 15:49:16,287 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Processing container_1325062142731_0006_01_000001 of type CONTAINER_LAUNCHED
> 2011-12-29 15:49:16,287 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1325062142731_0006_01_000001 transitioned from LOCALIZED 
> to RUNNING
> 2011-12-29 15:49:16,288 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerStartMonitoringEvent.EventType:
>  START_MONITORING_CONTAINER
> 2011-12-29 15:49:16,289 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Failed to launch container
> java.io.FileNotFoundException: File 
> /tmp/nm-local-dir/usercache/jiangbing/appcache/application_1325062142731_0006 
> does not exist
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431)
>     at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:815)
>     at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>     at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>     at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:700)
>     at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:697)
>    at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>     at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:697)
>     at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:123)
>     at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:237)
>     at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:67)
>     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662)
> 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerExitEvent.EventType:
>  CONTAINER_EXITED_WITH_FAILURE
> 2011-12-29 15:49:16,290 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Processing container_1325062142731_0006_01_000001 of type 
> CONTAINER_EXITED_WITH_FAILURE
> 2011-12-29 15:49:16,290 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1325062142731_0006_01_000001 transitioned from RUNNING 
> to EXITED_WITH_FAILURE
> 2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:
>  CLEANUP_CONTAINER
> 2011-12-29 15:49:16,290 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1325062142731_0006_01_000001
> 2011-12-29 15:49:16,290 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Marking container container_1325062142731_0006_01_000001 as inactive
> 2011-12-29 15:49:16,290 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Getting pid for container container_1325062142731_0006_01_000001 to kill 
> from pid file 
> /tmp/nm-local-dir/nmPrivate/container_1325062142731_0006_01_000001.pid
> 2011-12-29 15:49:16,290 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Accessing pid for container container_1325062142731_0006_01_000001 from pid 
> file /tmp/nm-local-dir/nmPrivate/container_1325062142731_0006_01_000001.pid
> 2011-12-29 15:49:16,307 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Dispatching the event 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ContainerLocalizationCleanupEvent.EventType:
>  CLEANUP_CONTAINER_RESOURCES
> In order to figure out the fact, I trace back to source code. I find that 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> @Override
>   public int launchContainer(Container container,
>       Path nmPrivateContainerScriptPath, Path nmPrivateTokensPath,
>       String userName, String appId, Path containerWorkDir)
>       throws IOException {
>       ....
>        String[] sLocalDirs = getConf().getStrings(
>         YarnConfiguration.NM_LOCAL_DIRS,
>         YarnConfiguration.DEFAULT_NM_
> LOCAL_DIRS);
>     for (String sLocalDir : sLocalDirs) {
>       Path usersdir = new Path(sLocalDir, ContainerLocalizer.USERCACHE);
>       Path userdir = new Path(usersdir, userName);
>       Path appCacheDir = new Path(userdir, ContainerLocalizer.APPCACHE);
>       Path appDir = new Path(appCacheDir, appIdStr);
>       Path containerDir = new Path(appDir, containerIdStr);
>       lfs.mkdir(containerDir, null, false);
>    }
>   ....
> lfs.mkdir(containerDir, null, false);  refer to the api of mkdir, false means 
> cannot create parent path here if not exists.
> In my hadoop project, I revise  lfs.mkdir(containerDir, null, false);  to 
> lfs.mkdir(containerDir, null, true); , then my program goes well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to