[
https://issues.apache.org/jira/browse/KAFKA-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang resolved KAFKA-9113.
----------------------------------
Fix Version/s: 2.6.0
Resolution: Fixed
This clean is done in trunk (2.6.0) now.
> Clean up task management
> ------------------------
>
> Key: KAFKA-9113
> URL: https://issues.apache.org/jira/browse/KAFKA-9113
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 2.4.0
> Reporter: Sophie Blee-Goldman
> Assignee: Sophie Blee-Goldman
> Priority: Major
> Fix For: 2.6.0
>
>
> Along KIP-429 we did a lot of refactoring of the task management classes,
> including the TaskManager and AssignedTasks (and children). While hopefully
> easier to reason about there's still significant opportunity for further
> cleanup including safer state tracking. Some potential improvements:
> 1) Verify that no tasks are ever in more than one state at once. One
> possibility is to just check that the suspended, created, restoring, and
> running maps are all disjoint, but this begs the question of when and where
> to do those checks, and how often. Another idea might be to put all tasks
> into a single map and just track their state on a per-task basis. Whatever it
> is should be aware that some methods are on the critical code path, and
> should not be burdened with excessive safety checks (ie
> AssignedStreamTasks#process). Alternatively, it seems to make sense to just
> make each state its own type. We can then do some cleanup of the AbstractTask
> and StreamTask classes, which currently contain a number of methods specific
> to only one type/state of task. For example
> * only active running tasks ever need to be suspendable, yet every task does
> through suspend then closeSuspended during close.
> * as the name suggests, closeSuspended should technically only ever apply to
> suspended tasks
> * the code paths needed to perform certain actions such as closing or
> committing a task vary widely between the different states. A restoring task
> need only close its state manager, but skipping the task.close call and
> calling only closeStateManager has lead to confusion and time wasted trying
> to remember or ask someone why that is sufficient
> 2) Cleanup of closing and/or shutdown logic – there are some potential
> improvements to be made here as well, for example AssignedTasks currently
> implements a closeZombieTask method despite the fact that standby tasks are
> never zombies.
> 3) The StoreChangelogReader also interacts with (only) the
> AssignedStreamsTasks class, through the TaskManager. It can be difficult to
> reason about these interactions and the state of the changelog reader.
> 4) All 4 classes and their state have very strict consistency requirements
> that currently are almost impossible to verify, which has already resulted in
> several bugs that we were lucky to catch in time. We should tighten up how
> these classes manage their own state, and how the overall state is managed
> between them, so that it is easy to make changes without introducing new bugs
> because one class updated its own state without knowing it needed to tell
> another class to also update its
--
This message was sent by Atlassian Jira
(v8.3.4#803005)