[
https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147008#comment-13147008
]
Vinod Kumar Vavilapalli commented on MAPREDUCE-3333:
----------------------------------------------------
bq. The close call shouldn't really be required with the idle time set to 0.
My idea was to actually remove the maxIdleTime setting once the root issue
HADOOP-7317 is fixed. I'll let it be.
bq. Should RPCClientFactoryPBImpl call RPC.stopProxy ? instead of putting it in
all the service client impls? It's a PB specific factory, so putting it here
should be ok.
No, that isn't possible. We need access to the proxy object in each impl. Bane
of multiple layering in this part of the code.
bq.Otherwise - the Exception in stopClient() should not be ignored.
Sure, I'll throw exception so that it is clear if somebody calles stopClient()
for a protocol that doesn't implement it.
bq. The client cache (removed by the patch) in ContainerLauncherImpl would
still be useful in non-secure mode. This works for both though - so isn't high
priority. Maybe a separate jira.
Sure, but helps to have the same implementation. Separate JIRA if someone needs
it.
bq. Forgot to mention - nice clean workaround to the rpc stop not working
Thought it'd be way more involved.
Yeah, been running with this workaround since nearly a week but didn't put that
in the patch in the hope of fixing the root cause. Turns out that is the only
short term solution, alas.
> MR AM for sort-job going out of memory
> --------------------------------------
>
> Key: MAPREDUCE-3333
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster, mrv2
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3333-20111102.txt,
> MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due
> to OutOfMemory and eventually failed after an hour instead of the usual odd
> 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258]
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container
> launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
> at
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
> at
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed
> on local exception: java.io.IOException: Couldn't set up IO streams; Host
> Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189";
> destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450;
> at
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
> at $Proxy20.startContainer(Unknown Source)
> at
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
> ... 4 more
> Caused by: java.io.IOException: Failed on local exception:
> java.io.IOException: Couldn't set up IO streams; Host Details : local host
> is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is:
> ""gsbl91525.blue.ygrid.yahoo.com":45450;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
> at org.apache.hadoop.ipc.Client.call(Client.java:1089)
> at
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
> ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
> at
> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
> at org.apache.hadoop.ipc.Client.call(Client.java:1065)
> ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:597)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
> ... 10 more
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira