Jagadish created SAMZA-2165:
-------------------------------

             Summary: Account for coordinator restarts in calls to status
                 Key: SAMZA-2165
                 URL: https://issues.apache.org/jira/browse/SAMZA-2165
             Project: Samza
          Issue Type: Bug
            Reporter: Jagadish
            Assignee: Jagadish


Currently status of a Samza job is determined by a combination of:
1. Obtaining YARN's status for the job by querying the RM
2. Obtain the AM/coordinator URL for the job
3. If (1) is "Running", Query the job's coordinator URL if all containers have 
started

YARN may restart the coordinator between (2) and (3) and the old coordinator 
process may no longer be alive, triggering a ConnectException in (3). This 
causes the status-call to fail; 

A better alternative to handle these retriable errors is to return a "New" 
status from the API - so that applications can keep polling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to