lakshmi-manasa-g opened a new pull request #1589: URL: https://github.com/apache/samza/pull/1589
**Symptom:** When elasticity is enabled, for certain kind of input streams, some of the containers are not processing anything when container count = elastic task count = elasticity factor X original task count. **Cause:** The input stream where this was observed had its message keys such that key.hashcode()%elasticiyFactor was always even for some partitions and odd for other partitions. lead to some of the elastic tasks no getting any messages. This is not a bug in the elasticity code but rather a skew in the input stream’s key distribution. **Changes:** 1. to distribute the messages to elastic tasks more evenly, the key bucket computation aka key.hashCode()%elasticityFactor is modified to (key.hashCode%31)%elasicityFactor and max value for elasticity factor is limited to 16 to be able to use %31 safely. 2. as a side effect, EOS message handling was found to be incorrect and rectified to remove from the task’s processing set, any keybucket of the eos messages’s ssp. This was discovered in tests which was not broken earlier due to EOS.hashCode%elasticityFactor coinciding with the task’s processing ssp in test setup. **Tests:** added unit tests to check elasticity factor range. **API changes:** None **Usage/Upgrade instructions:** None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
