C0urante commented on PR #16788:
URL: https://github.com/apache/kafka/pull/16788#issuecomment-2272238495

   > I think that's still possible, if we're the first leader and don't get a 
chance to write a key to the topic before executing a request from a caller.
   
   As long as we're using the `sessioned` protocol, if there's no key in the 
topic and we're receiving a request, then the request is guaranteed to fail, 
since it won't have a signature. I think this is only really possible if a 
cluster is upgraded from some other protocol to `sessioned`; otherwise, the 
leader will have a chance to write a new key to the config topic before doing 
anything else like writing new connector configs. We could possibly add retry 
logic for followers to wait for a session key to be published before sending 
requests to the leader when the `sessioned` protocol is enabled, but unless I'm 
mistaken, this would probably require unrelated changes to this PR and can 
probably be considered out of scope.
   
   > If the key is null, we could reject it with a 403 and not reveal that the 
cluster is still starting up, triggering the retry logic on the requesting 
worker.
   
   Returning a 403 response instead of a 503 won't change the retry logic of 
the worker issuing the request. We still retry on (almost) every error when 
forwarding task configs (unless the connector has generated too many tasks 
configs, as of 
[KIP-1004](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1004%3A+Enforce+tasks.max+property+in+Kafka+Connect)),
 and on no errors when requesting zombie fencings. The only difference is that 
we'd log something at [debug 
level](https://github.com/apache/kafka/blob/5596f9e1d561dfef46a83fbd7264e6745ca538cc/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L2190)
 instead of at [error 
level](https://github.com/apache/kafka/blob/5596f9e1d561dfef46a83fbd7264e6745ca538cc/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L2195).
 Unless there's a serious security concern (which we should probably address 
via a separate channel), I'd prefer not to revert 
 the 503 logic, since it is a useful signal for people reading worker logs 
(including me... it really helped when debugging these test failures!), and we 
might even leverage it for retry logic in the future.
   
   > We could also move the key verification to the herder thread, so that 
requests made during startup are delayed until the herder has finished 
starting, and we're sure that we've tried to install a key if one does not 
already exist. This might be less secure though, as it would allow a 
non-authenticated caller to saturate the herder tick thread.
   
   Given the small security issue this presents, unless there's still a 
correctness problem I'm missing, I don't think this is a superior alternative. 
But still, if you have further concerns, I'd be happy to hear them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to