gharris1727 commented on PR #16788:
URL: https://github.com/apache/kafka/pull/16788#issuecomment-2272393801

   > No, it does not affect correctness, it's just a useful signal to help 
narrow down the cause of failure.
   
   This confused me, and I realized it's because my question was poorly formed. 
Instead of "affect correctness" i should have said "cause the test to fail", 
which it does, and you provided a test where that happened:
   ```
   [2024-08-02 04:34:10,745] ERROR [MirrorSourceConnector|task-0] 
ExactlyOnceWorkerSourceTask{id=MirrorSourceConnector-0} Task threw an uncaught 
and unrecoverable exception. Task is being killed and will not recover until 
manually restarted (org.apache.kafka.connect.runtime.WorkerTask:234)
   org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: This 
worker is still starting up and has not been able to read a session key from 
the config topic yet
           at 
org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:186)
           at 
org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:140)
           at 
org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:101)
           at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.lambda$fenceZombieSourceTasks$23(DistributedHerder.java:1329)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
           at java.base/java.lang.Thread.run(Thread.java:1583)
   ```
   
   Sorry for not looking at that first. This is a one-shot call, and failures 
to contact the leader (such as the leader starting up) are not retried, where 
the config forwarding code path _does_ retry. This change could improve 
flakiness by reducing the opportunity for this call to fail.
   
   > which means that a worker is guaranteed to read the latest session key 
from the config topic before becoming the leader.
   
   Sure, but because validating the session key happens at a different time 
than checking isLeader(), you could:
   1. Read a stale session key
   2. Receive an incoming request, and validate against the stale session key
   3. Queue up a herder request
   4. Read the fresh session key
   5. Finish readToEnd inside start/startServices
   6. Join the group as the leader
   7. Check isLeader() 
   8. Process the illegal request
   
   Since it's a malicious request, it can come in before the worker joins the 
group and and the other workers are made aware of the leader.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to