[ 
https://issues.apache.org/jira/browse/SAMZA-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Kumarasubramanian updated SAMZA-2663:
---------------------------------------------
    Summary: Handle job model expiration and new job model flows for multiple 
incomplete rebalances  (was: Update active job model to proposed job model on 
job model expiration)

> Handle job model expiration and new job model flows for multiple incomplete 
> rebalances
> --------------------------------------------------------------------------------------
>
>                 Key: SAMZA-2663
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2663
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Bharath Kumarasubramanian
>            Assignee: Bharath Kumarasubramanian
>            Priority: Major
>
> *Problem*:
> As part of  SAMZA-2638, we introduced skipping container restart and stops on 
> no changes to work assignment for processors across rebalances. However, we 
> only update the active job model with the proposed job model on starting the 
> container as part of `onNewJobModel`. This leads to a scenario where the 
> processor is stopped but the future rebalances assume the container is still 
> running. More information on scenario below.
> *Scenario*: 
> Imagine the quorum is in steady state with job model version v1. A new 
> rebalance occurs and the leader generates v2. Processor P1 has changes in 
> work assignment and as a result stops the container as part of job model 
> expiration. However, in the event of the rebalance being unsuccessful 
> (barrier times out), a new rebalance occurs which generates a job model 
> version v3. In the scenario where work assignment for P1 in v3 is same as v1, 
> then the state transition assumes the processor hasn't stopped the container 
> and proceeds to do an no-op.
> *Changes*:
> Update the active job model regardless of whether we render the current job 
> model obsolete or not for the current processor during checkAndExpireJobModel 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to