Hi all,
  In my giraph job, when I set the worker to be 200, it is ok, and while set
to 500, it will fail due to early stage OOM exception in one (or more)
workers. As this worker fails, other workers who wants to talk with this
worker will keep on waiting until tried 5 times, then that worker will fail.

Have you ever faced such issue?

Best,
-z


Here is the exception,
2011-10-08 09:26:59,108 INFO org.apache.giraph.comm.RPCCommunications:
getRPCServer: Added jobToken Ident: 17 6a 6f 62 5f 32 30 31 31 30 38 32 36
30 39 31 31 5f 36 36 37 30 39 30, Pass: 12 26 1a f1 d2 51 e1 bf 2d 36 63 11
26 18 17 3d 53 b3 15 f6, Kind: mapreduce.job, Service:
job_201108260911_667090

2011-10-08 09:26:59,116 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2011-10-08 09:26:59,116 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2011-10-08 09:26:59,120 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
RpcDetailedActivityForPort31250 registered.
2011-10-08 09:26:59,121 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
RpcActivityForPort31250 registered.
2011-10-08 09:26:59,123 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2011-10-08 09:26:59,123 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 31250: starting
2011-10-08 09:26:59,127 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 31250: starting
2011-10-08 09:26:59,127 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 31250: starting
2011-10-08 09:26:59,133 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 31250: starting
2011-10-08 09:26:59,133 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 31250: starting
2011-10-08 09:26:59,137 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 31250: starting
2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 31250: starting
2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 31250: starting
2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 31250: starting
2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 31250: starting
2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 31250: starting
2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 10 on 31250: starting
2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 11 on 31250: starting
2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 12 on 31250: starting
2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 13 on 31250: starting
2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 14 on 31250: starting
2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 15 on 31250: starting
2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 16 on 31250: starting
2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 17 on 31250: starting
2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 18 on 31250: starting
2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 19 on 31250: starting
2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 20 on 31250: starting
2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 21 on 31250: starting
2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 22 on 31250: starting
2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 23 on 31250: starting
2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 24 on 31250: starting
2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 25 on 31250: starting
2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 26 on 31250: starting
2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 27 on 31250: starting
2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 28 on 31250: starting
2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 29 on 31250: starting
2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 30 on 31250: starting
2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 31 on 31250: starting
2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 32 on 31250: starting
2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 33 on 31250: starting
2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 34 on 31250: starting
2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 35 on 31250: starting
2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 36 on 31250: starting
2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 37 on 31250: starting
2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 38 on 31250: starting
2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 39 on 31250: starting
2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 40 on 31250: starting
2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 41 on 31250: starting
2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 42 on 31250: starting
2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 43 on 31250: starting
2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 44 on 31250: starting
2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 45 on 31250: starting
2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 46 on 31250: starting
2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 47 on 31250: starting
2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 48 on 31250: starting
2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 49 on 31250: starting
2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 50 on 31250: starting
2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 51 on 31250: starting
2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 52 on 31250: starting
2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 53 on 31250: starting
2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 54 on 31250: starting
2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 55 on 31250: starting
2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 56 on 31250: starting
2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 57 on 31250: starting
2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 58 on 31250: starting
2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 59 on 31250: starting
2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 60 on 31250: starting
2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 61 on 31250: starting
2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 62 on 31250: starting
2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 63 on 31250: starting
2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 64 on 31250: starting
2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 65 on 31250: starting
2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 66 on 31250: starting
2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 67 on 31250: starting
2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 68 on 31250: starting
2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 69 on 31250: starting
2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 70 on 31250: starting
2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 71 on 31250: starting
2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 72 on 31250: starting
2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 73 on 31250: starting
2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 74 on 31250: starting
2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 75 on 31250: starting
2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 76 on 31250: starting
2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 77 on 31250: starting
2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 78 on 31250: starting
2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 79 on 31250: starting
2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 80 on 31250: starting
2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 81 on 31250: starting
2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 82 on 31250: starting
2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 83 on 31250: starting
2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 84 on 31250: starting
2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 85 on 31250: starting
2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 86 on 31250: starting
2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 87 on 31250: starting
2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 88 on 31250: starting
2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 89 on 31250: starting
2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 90 on 31250: starting
2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 91 on 31250: starting
2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 92 on 31250: starting
2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 93 on 31250: starting
2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 94 on 31250: starting
2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 95 on 31250: starting
2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 96 on 31250: starting
2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 97 on 31250: starting
2011-10-08 09:26:59,161 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 98 on 31250: starting
2011-10-08 09:26:59,161 INFO
org.apache.giraph.comm.BasicRPCCommunications: BasicRPCCommunications:
Started RPC communication server:
gsta33033.tan.ygrid.yahoo.com/10.216.176.59:31250 with 100 handlers
2011-10-08 09:26:59,161 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 99 on 31250: starting
2011-10-08 09:27:05,234 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=102400 and reduceRetainSize=102400
2011-10-08 09:27:05,236 FATAL org.apache.hadoop.mapred.Child: Error
running child : java.lang.OutOfMemoryError: unable to create new
native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:597)
        at java.lang.UNIXProcess$1.run(UNIXProcess.java:141)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:103)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
        at org.apache.hadoop.util.Shell.run(Shell.java:182)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:540)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.access$100(RawLocalFileSystem.java:37)
        at 
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:417)
        at 
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:400)
        at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:275)
        at 
org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:255)

2011-10-08 09:27:05,272 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask
metrics system...
2011-10-08 09:27:05,272 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping metrics
source ugi(org.apache.hadoop.security.UgiInstrumentation)
2011-10-08 09:27:05,272 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping metrics
source jvm(org.apache.hadoop.metrics2.source.JvmMetricsSource)
2011-10-08 09:27:05,272 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping metrics
source 
RpcDetailedActivityForPort31250(org.apache.hadoop.ipc.metrics.RpcInstrumentation$Detailed)
2011-10-08 09:27:05,272 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping metrics
source RpcActivityForPort31250(org.apache.hadoop.ipc.metrics.RpcInstrumentation)
2011-10-08 09:27:05,272 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics
system stopped.


-- 
Best Regards
Zhiwei Gu

Reply via email to