GitHub user vjagadish1989 opened a pull request: https://github.com/apache/samza/pull/243
SAMZA-1359: Handle phantom container notifications cleanly during an RM fail-over 1. Improved our container handling logic to be resilient to phantom notifications. 2. Added a new metric to Samza's ContainerProcessManager module that tracks the number of such invalid notifications. 3. Add a couple of tests that simulate this exact scenario above that we encountered during the cluster upgrade. (container starts -> container fails -> legitimate notification for the failure - container re-start -> RM fail-over -> phantom notification with a different exit code) 4. As an aside, there are a whole bunch of tests in ContainerProcessManager that rely on Thread.sleep to ensure that threads get to run in a certain order. Removed this non-determinism and made them predictable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vjagadish1989/samza am-bug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/243.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #243 ---- commit 8ecf9dd847cea627aa788def871291d663ad9062 Author: Jagadish Venkatraman <jvenk...@jvenkatr-mn2.linkedin.biz> Date: 2017-07-13T18:32:27Z changes for phantom notification commit 262f420104aea69099e03b8d0ae33883f463088f Author: Jagadish Venkatraman <jvenk...@jvenkatr-mn2.linkedin.biz> Date: 2017-07-13T18:57:47Z fix bug commit 8646d729691468a714b819d2c15dc62b9f3f8b16 Author: Jagadish Venkatraman <jvenk...@jvenkatr-mn2.linkedin.biz> Date: 2017-07-13T18:59:33Z bug fixes commit 8ed7a3662376bb466679d10f121857bcc3ee1a30 Author: Jagadish Venkatraman <jvenk...@jvenkatr-mn2.linkedin.biz> Date: 2017-07-13T19:08:02Z fix typos commit beb8b0650b7c68748d2369a41a0e149c8a31667b Author: Jagadish Venkatraman <jvenk...@jvenkatr-mn2.linkedin.biz> Date: 2017-07-14T17:09:06Z typos, and java doc fixes ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---