[
https://issues.apache.org/jira/browse/SAMZA-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shanthoosh Venkataraman updated SAMZA-1506:
-------------------------------------------
Summary: Potential orphaned containers problem in SamzaContainer. (was:
Potential orphaned containers in LocalContainerRunner.)
> Potential orphaned containers problem in SamzaContainer.
> --------------------------------------------------------
>
> Key: SAMZA-1506
> URL: https://issues.apache.org/jira/browse/SAMZA-1506
> Project: Samza
> Issue Type: Bug
> Reporter: Shanthoosh Venkataraman
> Assignee: Abhishek Shivanna
> Fix For: 0.14.0
>
>
> We noticed an occurrence of orphaned container in LinkedIn production
> environment(using samza-yarn).
> The ContainerHeartbeatMonitor added as part of SAMZA-871 to solve this
> problem is alive on the orphaned container java process and didn't shut it
> down.
> ContainerHeartbeatMonitor uses single-threaded ScheduledExecutorService to
> periodically check if the container is orphaned.
> From the following process thread dump, it's apparent that the worker thread
> in ScheduledExecutorService finds the task queue is empty and goes to waiting
> state(expecting new tasks to be added to the queue).
> {code:java}
> "Samza-ContainerHeartbeatMonitor-0" #34 prio=5 os_prio=0
> tid=0x00007f9322896800 nid=0x38af waiting on condition [0x00007f92f363e000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x000000070078a0e8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> If the execution of a Runnable submitted to
> ScheduledExecutorService.scheduleAtFixedRate throws an exception, subsequent
> executions are suppressed.
> Existing ContainerHeartBeatClient implementation which accesses the
> ApplicationMaster http-endpoint to get container liveness has IOException
> handlers alone. Any unchecked exceptions thrown from that code path will
> shutdown the ContainerHeartbeatMonitor(This is the suspected cause).
> This requires further investigation.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)