Lakshmi Manasa Gaduputi created SAMZA-2728:
----------------------------------------------

             Summary: [Elasticity] improve distribution of messages across 
elastic tasks
                 Key: SAMZA-2728
                 URL: https://issues.apache.org/jira/browse/SAMZA-2728
             Project: Samza
          Issue Type: Improvement
            Reporter: Lakshmi Manasa Gaduputi
            Assignee: Lakshmi Manasa Gaduputi


Symptom: When elasticity is enabled, for certain kind of input streams, some of 
the containers are not processing anything when container count = elastic task 
count = elasticity factor X original task count.

Cause: The input stream where this was observed had its message keys such that 
key.hashcode()%elasticiyFactor was always even for some partitions and odd for 
other partitions. This lead to some of the elastic tasks no getting any 
messages. This is not a bug in the elasticity code but rather a skew in the 
input stream’s key distribution.

can be fixed via key bucket computation aka key.hashCode()%elasticityFactor is 
modified to (key.hashCode%31)%elasicityFactor and max value for elasticity 
factor is limited to 16 to be able to use 31 safely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to