[
https://issues.apache.org/jira/browse/MESOS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485801#comment-13485801
]
Qinghe Jin commented on MESOS-298:
----------------------------------
Hi Ben, I have compared the FrameworkExecutor.java carefully, and it seems I
already have the latest version of that file.
It's not easy to reproduce it from trunk because I have made some changes to
both mesos and FrameworkScheduler trying to support disk load balance. Now it's
still in experiment phase.
TASK_LOST is not the whole story. What's bothering me now is some new kinds of
error like below:
Task Id : attempt_201210291044_0002_m_000001_0, Status : FAILED
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid
local directory for output/spill0.out
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1392)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
The problem seems to be caused by lack of disk space, but I have seen a lot of
space left. Another problem is :
Task Id : attempt_201210291044_0002_m_000007_0, Status : FAILED
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:278)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:265)
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:278)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:265)
12/10/29 10:47:04 WARN mapred.JobClient: Error reading task
outputhttp://blade15:50060/tasklog?plaintext=true&attemptid=attempt_201210291044_0002_m_000007_0&filter=stdout
12/10/29 10:47:04 WARN mapred.JobClient: Error reading task
outputhttp://blade15:50060/tasklog?plaintext=true&attemptid=attempt_201210291044_0002_m_000007_0&filter=stderr
12/10/29 10:47:04 INFO mapred.JobClient: Task Id :
attempt_201210291044_0002_r_000000_0, Status : FAILED
A new week, It looks like I have a lot to do next.
> Executor fails to start
> -----------------------
>
> Key: MESOS-298
> URL: https://issues.apache.org/jira/browse/MESOS-298
> Project: Mesos
> Issue Type: Question
> Components: framework, slave
> Affects Versions: 0.9.0
> Environment: open Suse 11.0
> Reporter: Qinghe Jin
>
> When the master asks the hadoop executor to start, the executor could be
> forked successfully but fails quickly which result in the TASK_LOST. The
> output in **/executors/default/runs/id/stderr looks like below:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/mesos/Executor
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
> at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
> at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> Caused by: java.lang.ClassNotFoundException: org.apache.mesos.Executor
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> ... 12 more
> Could not find the main class: org.apache.hadoop.mapred.FrameworkExecutor.
> Program will exit.
> I know the reason is the caller can't find the org.apache.mesos.Executor, and
> I have found the class in mesos-0.9.0.jar, and I am sure that it can find
> it(if not, the jobtracker will fail to start). But each time I run it, the
> executor fails quickly.
> I am not familiar with java, so I tried all ways I can get from google, but
> still can't fix it. I have been suffered from it for almost one week. Anyone
> can help? I appreciate it very much! Thanks ahead!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira