lakshmi-manasa-g opened a new pull request #1576:
URL: https://github.com/apache/samza/pull/1576


   **Feature:** Elasticity (SAMZA-2687) for a Samza job allows job to have more 
tasks than the number of input SystemStreamPartition(SSP). Thus, a job can 
scale up beyond its input partition count without needing the repartition the 
input stream.
   - This is achieved by having elastic tasks which is the same as a task for 
all practical purposes. But an elastic task consumes only a subset of the 
messages of an SSP. 
   - With an elasticity factor F (integer), the number of elastic tasks will be 
F times N with N = original task count. 
   - The F elastic tasks per original task all consume subsets of same SSP as 
the original task. There will be F subsets (aka key bucket) per SSP and a 
message falls into an SSP bucket 'i' if its message.key.hash()%F == i. 
    
   **Changes:**
   1. introduce the config for enabling elasticity as job.elasticity.factor. If 
the job without elasticity has N tasks then with factor = F > 1, there will be 
F times N (elastic) tasks
   2. Add "key bucket" (an integer ranging 0-F) to SSP which will identify the 
messages within the SSP
   3. Compute the key bucket the IncomingMessageEnvelope falls into given 
elasticity factor F. 
   4. SamzaObjectMapper changes to serde keyBucket component of SSP. 
    
   
**Tests:** updated unit tests and added new ones.
   

   
**API Changes:** no public API changes
    
   **Upgrade Instructions:** N/A
    
   **Usage Instructions:** set the config job.elasticity.factor > 1 to enable 
elasticity for the job. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to