[
https://issues.apache.org/jira/browse/SAMZA-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306843#comment-17306843
]
Bharath Kumarasubramanian commented on SAMZA-2633:
--------------------------------------------------
We will tackle the changes into multiple stages
# Changes to processor rebalance flow when work assignment doesn't change
[SAMZA-2638]
# Changes to processor startup flow by using last active job model version
# Changes to leader to add criteria on when to trigger rebalance
2 & 3 will be tackled as part of this ticket.
> Rolling deployment/upgrade causes downtime for processors for the entire
> deployment window
> ------------------------------------------------------------------------------------------
>
> Key: SAMZA-2633
> URL: https://issues.apache.org/jira/browse/SAMZA-2633
> Project: Samza
> Issue Type: Bug
> Reporter: Bharath Kumarasubramanian
> Assignee: Bharath Kumarasubramanian
> Priority: Major
>
> *Problem*:
> At LinkedIn, we noticed several standalone users complained about
> lag/downtime during rolling deployments/upgrades.
> *Description*:
> During rolling upgrades, the current debounce timer gets extended every time
> when there is a quorum change notification. As a result, processors that were
> upgraded earlier in the deployment window remain unavailable waiting for work
> assignment. In some scenarios, this cause processors to be unavailable for 20
> minutes or so depending on the size of the quorum and the debounce time
> configuration.
> *Impact*:
> Partitions that were stopped for initial processors as part of upgrade remain
> unassigned for the entire deployment window which can result in processing
> lag.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)