[ https://issues.apache.org/jira/browse/MAPREDUCE-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrey Klochkov updated MAPREDUCE-5501: --------------------------------------- Attachment: (was: handing-rmcontainer-allocator.stdout) > RMContainer Allocator does not stop when cluster shutdown is performed in > tests > ------------------------------------------------------------------------------- > > Key: MAPREDUCE-5501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5501 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager > Affects Versions: trunk > Reporter: Andrey Klochkov > > After running MR job client tests many MRAppMaster processes stay alive. The > reason seems that RMContainer Allocator thread ignores InterruptedException > and keeps retrying: > {code} > 2013-09-09 18:52:07,505 WARN [RMCommunicator Allocator] > org.apache.hadoop.util.ThreadUtil: interrupted while sleeping > java.lang.InterruptedException: sleep interrupted > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:149) > at com.sun.proxy.$Proxy29.allocate(Unknown Source) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:154) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:553) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:236) > at java.lang.Thread.run(Thread.java:680) > 2013-09-09 18:52:37,639 INFO [RMCommunicator Allocator] > org.apache.hadoop.ipc.Client: Retrying connect to server: > dhcpx-197-141.corp.yahoo.com/10.73.197.141:61163. Already tried 0 time(s); > retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1 SECONDS) > 2013-09-09 18:52:38,640 INFO [RMCommunicator Allocator] > org.apache.hadoop.ipc.Client: Retrying connect to server: > dhcpx-197-141.corp.yahoo.com/10.73.197.141:61163. Already tried 1 time(s); > retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1 SECONDS) > {code} > It takes > 6 minutes for the processes to die, and this causes various issues > with tests which use the same DFS dir. > {code} > 2013-09-09 22:26:47,179 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Error > communicating with RM: Could not contact RM after 360000 milliseconds. > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not contact RM > after 360000 milliseconds. > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:563) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:236) > at java.lang.Thread.run(Thread.java:680) > {code} > Will attach a thread dump separately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira