Sophie Blee-Goldman created KAFKA-9113:
------------------------------------------

             Summary: Clean up task management
                 Key: KAFKA-9113
                 URL: https://issues.apache.org/jira/browse/KAFKA-9113
             Project: Kafka
          Issue Type: Improvement
    Affects Versions: 2.4.0
            Reporter: Sophie Blee-Goldman


Along KIP-429 we did a lot of refactoring of the task management classes, 
including the TaskManager and AssignedTasks (and children).  While hopefully 
easier to reason about there's still significant opportunity for further 
cleanup including safer state tracking.  Some potential improvements:

1) Verify that no tasks are ever in more than one state at once. One 
possibility is to just check that the suspended, created, restoring, and 
running maps are all disjoint, but this begs the question of when and where to 
do those checks, and how often. Another idea might be to put all tasks into a 
single map and just track their state on a per-task basis. Whatever it is 
should be aware that some methods are on the critical code path, and should not 
be burdened with excessive safety checks (ie AssignedStreamTasks#process)

2) Cleanup of closing and/or shutdown logic – there are some potential 
improvements to be made here as well, for example AssignedTasks currently 
implements a closeZombieTask method despite the fact that standby tasks are 
never zombies. 

3)  The StoreChangelogReader also interacts with (only) the 
AssignedStreamsTasks class, through the TaskManager. It can be difficult to 
reason about these interactions and the state of the changelog reader.

4) All 4 classes and their state have very strict consistency requirements that 
currently are almost impossible to verify, which has already resulted in 
several bugs that we were lucky to catch in time. We should tighten up how 
these classes manage their own state, and how the overall state is managed 
between them, so that it is easy to make changes without introducing new bugs 
because one class updated its own state without knowing it needed to tell 
another class to also update its



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to