[
https://issues.apache.org/jira/browse/SAMZA-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bharath Kumarasubramanian updated SAMZA-2663:
---------------------------------------------
Summary: Handle job model expiration and new job model flows for multiple
incomplete rebalances (was: Update active job model to proposed job model on
job model expiration)
> Handle job model expiration and new job model flows for multiple incomplete
> rebalances
> --------------------------------------------------------------------------------------
>
> Key: SAMZA-2663
> URL: https://issues.apache.org/jira/browse/SAMZA-2663
> Project: Samza
> Issue Type: Bug
> Reporter: Bharath Kumarasubramanian
> Assignee: Bharath Kumarasubramanian
> Priority: Major
>
> *Problem*:
> As part of SAMZA-2638, we introduced skipping container restart and stops on
> no changes to work assignment for processors across rebalances. However, we
> only update the active job model with the proposed job model on starting the
> container as part of `onNewJobModel`. This leads to a scenario where the
> processor is stopped but the future rebalances assume the container is still
> running. More information on scenario below.
> *Scenario*:
> Imagine the quorum is in steady state with job model version v1. A new
> rebalance occurs and the leader generates v2. Processor P1 has changes in
> work assignment and as a result stops the container as part of job model
> expiration. However, in the event of the rebalance being unsuccessful
> (barrier times out), a new rebalance occurs which generates a job model
> version v3. In the scenario where work assignment for P1 in v3 is same as v1,
> then the state transition assumes the processor hasn't stopped the container
> and proceeds to do an no-op.
> *Changes*:
> Update the active job model regardless of whether we render the current job
> model obsolete or not for the current processor during checkAndExpireJobModel
--
This message was sent by Atlassian Jira
(v8.3.4#803005)