lakshmi-manasa-g opened a new pull request #1576:
URL: https://github.com/apache/samza/pull/1576
**Feature:** Elasticity (SAMZA-2687) for a Samza job allows job to have more
tasks than the number of input SystemStreamPartition(SSP). Thus, a job can
scale up beyond its input partition count without needing the repartition the
input stream.
- This is achieved by having elastic tasks which is the same as a task for
all practical purposes. But an elastic task consumes only a subset of the
messages of an SSP.
- With an elasticity factor F (integer), the number of elastic tasks will be
F times N with N = original task count.
- The F elastic tasks per original task all consume subsets of same SSP as
the original task. There will be F subsets (aka key bucket) per SSP and a
message falls into an SSP bucket 'i' if its message.key.hash()%F == i.
**Changes:**
1. introduce the config for enabling elasticity as job.elasticity.factor. If
the job without elasticity has N tasks then with factor = F > 1, there will be
F times N (elastic) tasks
2. Add "key bucket" (an integer ranging 0-F) to SSP which will identify the
messages within the SSP
3. Compute the key bucket the IncomingMessageEnvelope falls into given
elasticity factor F.
4. SamzaObjectMapper changes to serde keyBucket component of SSP.
**Tests:** updated unit tests and added new ones.
**API Changes:** no public API changes
**Upgrade Instructions:** N/A
**Usage Instructions:** set the config job.elasticity.factor > 1 to enable
elasticity for the job.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]