C0urante commented on PR #16788: URL: https://github.com/apache/kafka/pull/16788#issuecomment-2271433660
The existing mitigation falls short in the usually-extremely-brief period between when a worker joins the group, and when it updates its `configState` snapshot. Distributed herders start off with an empty snapshot that uses -1 as the last-read offset from the config topic, and only update that snapshot in certain circumstances. On startup, the first update usually happens when the worker joins the group, sees a higher offset in its assignment (meaning that at least one worker has read up to a higher offset in the config topic), and then responds by reading to the end of the config topic and updating its snapshot (see [here](https://github.com/apache/kafka/blob/3ddd8d0a0ec02eab8d9083d341ece14961fc0d1c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1810) and [here](https://github.com/apache/kafka/blob/3ddd8d0a0ec02eab8d9083d341ece14961fc0d1c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1830)). I'm hesitant to alter the snapshot tracking logic because it increases the blast radius if we make a mistake. On the other hand, although the abstraction of not issuing listener updates until startup has completed works well for all other uses of the config topic, it's not really necessary for session key tracking, where there's never a case where we want to delay using a key we've just read from the config topic. Actually, I guess there is a small downside to this PR in its current state, since it'll cause [this log message](https://github.com/apache/kafka/blob/3ddd8d0a0ec02eab8d9083d341ece14961fc0d1c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L2510) to be emitted several times (possibly to the point of spamming worker logs) on startup. I can try to fix that if the rationale for the overall approach seems sound; LMKWYT. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
