Lakshmi Manasa Gaduputi created SAMZA-2728:
----------------------------------------------
Summary: [Elasticity] improve distribution of messages across
elastic tasks
Key: SAMZA-2728
URL: https://issues.apache.org/jira/browse/SAMZA-2728
Project: Samza
Issue Type: Improvement
Reporter: Lakshmi Manasa Gaduputi
Assignee: Lakshmi Manasa Gaduputi
Symptom: When elasticity is enabled, for certain kind of input streams, some of
the containers are not processing anything when container count = elastic task
count = elasticity factor X original task count.
Cause: The input stream where this was observed had its message keys such that
key.hashcode()%elasticiyFactor was always even for some partitions and odd for
other partitions. This lead to some of the elastic tasks no getting any
messages. This is not a bug in the elasticity code but rather a skew in the
input stream’s key distribution.
can be fixed via key bucket computation aka key.hashCode()%elasticityFactor is
modified to (key.hashCode%31)%elasicityFactor and max value for elasticity
factor is limited to 16 to be able to use 31 safely.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)