[ 
https://issues.apache.org/jira/browse/KAFKA-13501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867880#comment-17867880
 ] 

Matthias J. Sax commented on KAFKA-13501:
-----------------------------------------

I don't think state updated solve this? Also not sure why it's labeled with 
"new-streams-runtime-should-fix" – I don't see how.

It a task fails locally, and we would restart the task locally we rebuild state 
from scratch. With state-updater the same happens. The difference w/ 
state-update is "only" (can be significant) that we would not block all tasks 
from processing any longer, but keep processing all other tasks, while 
state-updater does the restore.

However, for the failed task, we still have offline time. The idea of this 
ticket was to say: if we have two instance A and B, and the local failures 
happens on A, and B has a standby, let's trigger a rebalance, and move the 
failed task to B to avoid offline time for the failed task all together. – On 
instance A, we might still re-build the state using state-updated, but B would 
take over processing in the mean time. And after A is done restoring, we could 
do another rebalance, to move the active back from B to A (and still keep a 
standby on B).

Does this make sense? (Maybe the ticket description was too brief?)

> Avoid state restore via rebalance if standbys are enabled
> ---------------------------------------------------------
>
>                 Key: KAFKA-13501
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13501
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>              Labels: new-streams-runtime-should-fix
>
> There are certain scenario in which Kafka Streams wipes out local state and 
> rebuilt it from scratch. This is a thread local cleanup, ie, no rebalance is 
> triggered, and we end up with an offline task until state restoration 
> finished.
> If standby tasks are enable, it might actually make sense to trigger a 
> rebalance instead, to get the task re-assigned to the instance hosting the 
> standby so get the task active again quickly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to