C0urante commented on PR #16788:
URL: https://github.com/apache/kafka/pull/16788#issuecomment-2271433660

   The existing mitigation falls short in the usually-extremely-brief period 
between when a worker joins the group, and when it updates its `configState` 
snapshot.
   
   Distributed herders start off with an empty snapshot that uses -1 as the 
last-read offset from the config topic, and only update that snapshot in 
certain circumstances. On startup, the first update usually happens when the 
worker joins the group, sees a higher offset in its assignment (meaning that at 
least one worker has read up to a higher offset in the config topic), and then 
responds by reading to the end of the config topic and updating its snapshot 
(see 
[here](https://github.com/apache/kafka/blob/3ddd8d0a0ec02eab8d9083d341ece14961fc0d1c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1810)
 and 
[here](https://github.com/apache/kafka/blob/3ddd8d0a0ec02eab8d9083d341ece14961fc0d1c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1830)).
   
   I'm hesitant to alter the snapshot tracking logic because it increases the 
blast radius if we make a mistake. On the other hand, although the abstraction 
of not issuing listener updates until startup has completed works well for all 
other uses of the config topic, it's not really necessary for session key 
tracking, where there's never a case where we want to delay using a key we've 
just read from the config topic.
   
   Actually, I guess there is a small downside to this PR in its current state, 
since it'll cause [this log 
message](https://github.com/apache/kafka/blob/3ddd8d0a0ec02eab8d9083d341ece14961fc0d1c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L2510)
 to be emitted several times (possibly to the point of spamming worker logs) on 
startup. I can try to fix that if the rationale for the overall approach seems 
sound; LMKWYT.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to