mynameborat commented on pull request #1452: URL: https://github.com/apache/samza/pull/1452#issuecomment-740003171
Symptom: When new AM takes a long time to to start up, already running container's heartbeat thread silently dies and does not make any heartbeat requests to the new AM. Cause: AM url (yarn.am.tracking.url) key-value is removed from Coordinator stream when new AM is starting up - as this config is present in old config (aka coordinator stream) but not in the new AM generated config. This causes the running container to fetch a null when its constantly fetching value for this key and thus throws NPE. Changes: When AMHA is enabled, do not remove this config Tests: works with hello-samza. Trying to write a unit test but CoordinatorStreamUtil is tricky to mock and inject stuff. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
