lakshmi-manasa-g opened a new pull request, #1598:
URL: https://github.com/apache/samza/pull/1598

   Feature: Elasticity (SAMZA-2687) for a Samza job allows job to have more 
tasks than the number of input SystemStreamPartition(SSP). Thus, a job can 
scale up beyond its input partition count without needing the repartition the 
input stream.
   This current PR is to compute the last processed offsets when container 
starts up using checkpoints from previous deploys. The current deploy or the 
previous deploys may have elasticity factor > 1
   
   Changes:
   1. Introduce ElasticityUtils which contains 
computeLastProcessedOffsetsFromCheckpointMap that computes a task’s last 
processed offsets using all the checkpoints present in the checkpoint stream 
for all tasks that were ever part of the job model.
   2. Update OffsetManager.loadOffsetsFromCheckpointManager to compute 
checkpoint using the ElasticityUtils if either the config 
“job.elasticity.checkpoints.enabled” or if checkpoint stream had checkpoints 
with elastic task names
   2. Introduces config “job.elasticity.checkpoints.enabled” config which is 
disabled by default and should be enabled when rolling back to disable 
elasticity or going back to elasticity factor = 1
   
   Tests:
   1. added tests for ElasticityUtils (yet to parametrize this test class)
   2. pending: to add an unit test for OffsetManager
   
   API changes: 
     no public api change. new config introduced 
“job.elasticity.checkpoints.enabled” (default false) which if true will check 
for previous deploys’ checkpoints
   
   Upgrade instructions: none
   
   Usage instructions: set “job.elasticity.checkpoints.enabled” to true when 
rolling back to disable elasticity.
   
   Backwards compatible: yes. does not affect the existing checkpoint 
computation as “job.elasticity.checkpoints.enabled” = false by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to