Jagadish created SAMZA-2165:
-------------------------------
Summary: Account for coordinator restarts in calls to status
Key: SAMZA-2165
URL: https://issues.apache.org/jira/browse/SAMZA-2165
Project: Samza
Issue Type: Bug
Reporter: Jagadish
Assignee: Jagadish
Currently status of a Samza job is determined by a combination of:
1. Obtaining YARN's status for the job by querying the RM
2. Obtain the AM/coordinator URL for the job
3. If (1) is "Running", Query the job's coordinator URL if all containers have
started
YARN may restart the coordinator between (2) and (3) and the old coordinator
process may no longer be alive, triggering a ConnectException in (3). This
causes the status-call to fail;
A better alternative to handle these retriable errors is to return a "New"
status from the API - so that applications can keep polling.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)