[ 
https://issues.apache.org/jira/browse/KAFKA-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler updated KAFKA-10086:
---------------------------------
    Description: 
This ticket was initially just to write an integration test, but I escalated it 
to a blocker and changed the title when the integration test actually surfaced 
two bugs:
 # Offset positions were not reported for in-memory stores, so tasks with 
in-memory stores would never be considered as "caught up" and could not take 
over active processing, preventing clusters from ever achieving balance. This 
is a regression in 2.6
 # When the TaskAssignor decided to switch active processing from a former 
owner to a new one that had a standby, the lower-level cooperative rebalance 
protocol would first de-schedule the task completely, and only later would 
assign it to the new owner. For in-memory stores, this causes the standby state 
not to be re-used, and for persistent stores, it creates a window in which the 
cleanup thread might delete the state directory. In both cases, even though the 
instance previously had a standby, once it gets the active, it still had to 
restore the entire state from the changelog.

  was:
This ticket was initially just to write an integration test, but I escalated it 
to a blocker and changed the title when the integration test actually surfaced 
two bugs:

1. Offset positions were not reported for in-memory stores,


> Standby state isn't always re-used when transitioning to active
> ---------------------------------------------------------------
>
>                 Key: KAFKA-10086
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10086
>             Project: Kafka
>          Issue Type: Task
>          Components: streams
>    Affects Versions: 2.6.0, 2.7.0
>            Reporter: John Roesler
>            Assignee: John Roesler
>            Priority: Blocker
>             Fix For: 2.6.0, 2.7.0
>
>
> This ticket was initially just to write an integration test, but I escalated 
> it to a blocker and changed the title when the integration test actually 
> surfaced two bugs:
>  # Offset positions were not reported for in-memory stores, so tasks with 
> in-memory stores would never be considered as "caught up" and could not take 
> over active processing, preventing clusters from ever achieving balance. This 
> is a regression in 2.6
>  # When the TaskAssignor decided to switch active processing from a former 
> owner to a new one that had a standby, the lower-level cooperative rebalance 
> protocol would first de-schedule the task completely, and only later would 
> assign it to the new owner. For in-memory stores, this causes the standby 
> state not to be re-used, and for persistent stores, it creates a window in 
> which the cleanup thread might delete the state directory. In both cases, 
> even though the instance previously had a standby, once it gets the active, 
> it still had to restore the entire state from the changelog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to