lakshmi-manasa-g opened a new pull request #1589:
URL: https://github.com/apache/samza/pull/1589


   **Symptom:** When elasticity is enabled, for certain kind of input streams, 
some of the containers are not processing anything when container count = 
elastic task count = elasticity factor X original task count.
   
   **Cause:** The input stream where this was observed had its message keys 
such that key.hashcode()%elasticiyFactor was always even for some partitions 
and odd for other partitions. lead to some of the elastic tasks no getting any 
messages. This is not a bug in the elasticity code but rather a skew in the 
input stream’s key distribution.
   
   **Changes:** 
   1. to distribute the messages to elastic tasks more evenly, the key bucket 
computation aka key.hashCode()%elasticityFactor is modified to 
(key.hashCode%31)%elasicityFactor and max value for elasticity factor is 
limited to 16 to be able to use %31 safely.
   2. as a side effect, EOS message handling was found to be incorrect and 
rectified to remove from the task’s processing set, any keybucket of the eos 
messages’s ssp. This was discovered in tests which was not broken earlier due 
to EOS.hashCode%elasticityFactor coinciding with the task’s processing ssp in 
test setup.
   
   **Tests:** added unit tests to check elasticity factor range.
   
   **API changes:** None
   
   **Usage/Upgrade instructions:** None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to