vjagadish1989 opened a new pull request #995: SAMZA-2165: Account for coordinator restarts in calls to status URL: https://github.com/apache/samza/pull/995 Currently status of a Samza job is determined by a combination of: 1. Obtain YARN's status for the job by querying the RM 2. Obtain the AM/coordinator URL for the job 3. If (1) is "Running", Query the coordinator URL if all containers have started YARN may restart the coordinator between (2) and (3) and the old coordinator process may no longer be alive, triggering a ConnectException in (3). This causes the status-call to fail; A better alternative to handle these retriable errors is to return a "New" status from the API - so that applications can continue polling for status.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
