[ 
https://issues.apache.org/jira/browse/SAMZA-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1506:
-------------------------------------------
    Summary: Potential orphaned containers problem in SamzaContainer.  (was: 
Potential orphaned containers  in LocalContainerRunner.)

> Potential orphaned containers problem in SamzaContainer.
> --------------------------------------------------------
>
>                 Key: SAMZA-1506
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1506
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Abhishek Shivanna
>             Fix For: 0.14.0
>
>
> We noticed an occurrence of orphaned container in LinkedIn production 
> environment(using samza-yarn). 
> The ContainerHeartbeatMonitor added as part of SAMZA-871 to solve this 
> problem is alive on the orphaned container java process and didn't shut it 
> down. 
> ContainerHeartbeatMonitor uses single-threaded ScheduledExecutorService to 
> periodically check if the container is orphaned.
> From the following process thread dump, it's apparent that the worker thread 
> in ScheduledExecutorService finds the task queue is empty and goes to waiting 
> state(expecting new tasks to be added to the queue).
> {code:java}
> "Samza-ContainerHeartbeatMonitor-0" #34 prio=5 os_prio=0 
> tid=0x00007f9322896800 nid=0x38af waiting on condition [0x00007f92f363e000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x000000070078a0e8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
>         at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> If the execution of a Runnable submitted to 
> ScheduledExecutorService.scheduleAtFixedRate throws an exception, subsequent 
> executions are suppressed. 
> Existing ContainerHeartBeatClient implementation which accesses the 
> ApplicationMaster http-endpoint to get container liveness has IOException 
> handlers alone. Any unchecked exceptions thrown from that code path will 
> shutdown the ContainerHeartbeatMonitor(This is the suspected cause).
> This requires further investigation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to