[
https://issues.apache.org/jira/browse/KAFKA-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Roesler updated KAFKA-10086:
---------------------------------
Description:
This ticket was initially just to write an integration test, but I escalated it
to a blocker and changed the title when the integration test actually surfaced
two bugs:
# Offset positions were not reported for in-memory stores, so tasks with
in-memory stores would never be considered as "caught up" and could not take
over active processing, preventing clusters from ever achieving balance. This
is a regression in 2.6
# When the TaskAssignor decided to switch active processing from a former
owner to a new one that had a standby, the lower-level cooperative rebalance
protocol would first de-schedule the task completely, and only later would
assign it to the new owner. For in-memory stores, this causes the standby state
not to be re-used, and for persistent stores, it creates a window in which the
cleanup thread might delete the state directory. In both cases, even though the
instance previously had a standby, once it gets the active, it still had to
restore the entire state from the changelog.
was:
This ticket was initially just to write an integration test, but I escalated it
to a blocker and changed the title when the integration test actually surfaced
two bugs:
1. Offset positions were not reported for in-memory stores,
> Standby state isn't always re-used when transitioning to active
> ---------------------------------------------------------------
>
> Key: KAFKA-10086
> URL: https://issues.apache.org/jira/browse/KAFKA-10086
> Project: Kafka
> Issue Type: Task
> Components: streams
> Affects Versions: 2.6.0, 2.7.0
> Reporter: John Roesler
> Assignee: John Roesler
> Priority: Blocker
> Fix For: 2.6.0, 2.7.0
>
>
> This ticket was initially just to write an integration test, but I escalated
> it to a blocker and changed the title when the integration test actually
> surfaced two bugs:
> # Offset positions were not reported for in-memory stores, so tasks with
> in-memory stores would never be considered as "caught up" and could not take
> over active processing, preventing clusters from ever achieving balance. This
> is a regression in 2.6
> # When the TaskAssignor decided to switch active processing from a former
> owner to a new one that had a standby, the lower-level cooperative rebalance
> protocol would first de-schedule the task completely, and only later would
> assign it to the new owner. For in-memory stores, this causes the standby
> state not to be re-used, and for persistent stores, it creates a window in
> which the cleanup thread might delete the state directory. In both cases,
> even though the instance previously had a standby, once it gets the active,
> it still had to restore the entire state from the changelog.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)