Sudarshan Vasudevan created GOBBLIN-1099:
--------------------------------------------

             Summary: Handle orphaned Yarn containers in Gobblin-on-Yarn 
clusters
                 Key: GOBBLIN-1099
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1099
             Project: Apache Gobblin
          Issue Type: Improvement
          Components: gobblin-yarn
    Affects Versions: 0.15.0
            Reporter: Sudarshan Vasudevan
            Assignee: Abhishek Tiwari
             Fix For: 0.15.0


A Yarn application may leave behind orphaned containers, which can happen due 
to lost node managers. The orphaned containers however can continue to run 
(potentially forever) as participants in the Helix cluster. This can cause the 
following problems for a Gobblin-on-Yarn application:
 # Double publish of data and commit of state
 # Task failures and partition starvation during application restarts, as Helix 
may assign tasks to the orphaned containers which have a stale state and 
configuration
 # Container failures on application restarts due to Helix instance name 
collisions with orphaned containers

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to