Hey Jun, Thanks for reading the KIP!
> controller.quorum.fetch.timeout.ms defaults to 2 secs. Is that a bit too low? I personally think that controller.quorum.fetch.timeout.ms is best kept at the current value. It determines how long until we start elections. If an active controller is truly down then it will take a minimum of controller.quorum.fetch.timeout.ms for us to detect that this has occurred. This would apply for all leadership changes. Further, this applies to raft state changes so increasing the timeout could have knock on effects. In other words the timeout affects a huge number of things in KRaft while increasing the fetch.bytes sent has more "narrow" effects. > Orthogonally, in the case when the network latency is long, one can typically tune socket.receive.buffer.bytes to improve the throughput. In the case where I noticed this problem occurring it was as a result of a lower overall throughput of the network - not just latency. Increasing socket.receive.buffer.bytes (which I assume is increasing the receive-window) will not help in all cases... Best, Jonah On Tue, Oct 7, 2025 at 2:07 PM Jun Rao <[email protected]> wrote: > Hi, Jonah, > > Thanks for the KIP. Just a couple of minor comments. > > controller.quorum.fetch.timeout.ms defaults to 2 secs. Is that a bit too > low? > > Orthogonally, in the case when the network latency is long, one can > typically tune socket.receive.buffer.bytes to improve the throughput. > > Jun > > > > On Fri, Sep 26, 2025 at 10:29 AM José Armando García Sancio > <[email protected]> wrote: > > > Thanks for the changes Jonah. Looking forward to the implementation of > > this feature. > > > > On Fri, Sep 26, 2025 at 12:33 PM Jonah Hooper > > <[email protected]> wrote: > > > > > > Thanks for the feedback José! Did some work to apply the KIP to a > correct > > > format. > > > > > > The motivation section starts with the solution without motivating the > > > > problem. I think you want to delete the first paragraph in the > > > > motivation section. > > > > > > > > > Done. > > > > > > In the motivation section you have "non-terminating loop where they > > > > cannot join the cluster." They are technically not joining the > > > > cluster. The issue is a liveness issue. Because the FETCH_SNAPSHOT > RPC > > > > doesn't complete within the fetch timeout configuration the fetching > > > > replica cannot make progress. In the worst case, if it is a > > > > voter/controller, it further disrupts the leader by becoming a > > > > candidate and increasing the epoch. > > > > > > > > > Updated the motivation to reflect this. > > > > > > > > > > > > > In the public interface section, please make it clear that the > changes > > > > are the addition of two new configurations. Let's also add > > descriptions for > > > > these configurations. > > > > > > > > > Added descriptions. > > > > > > Please fill out "Compatibility, Deprecation, and Migration Plan". Why > > > > is this safe and backward compatible? > > > > > > > > > > Done. > > > > > > In the test section you have "We can perform once off tests which > > > > evaluate the effect of degraded networking on snapshot. It's better > to > > > > design system-tests which focus on correctness by using > pathologically > > > > low values for controller.quorum.fetch.snapshot.max.bytes and > > > > controller.quorum.fetch.max.bytes. An example scenario would be a > > > > controller joining a quorum with a snapshot with a known size and > then > > > > attempting to fetch the snapshot in small (say 64kb) or even > > > > pathologically small values like 1 byte." I don't fully understand > > > > this recommendation but if my understanding is correct this sounds > > > > very complicated and flakey to implement. Can't we just add protocol > > > > tests to KafkaRaftClient*Test that show these configurations being > > > > used when handling FETC and FETCH_SNAPSHOT requests? > > > > > > > > > Good point; that section is too specific. I've left it a bit more > vague, > > > unit tests are sufficient provided we can "mock" a situation where we > > > retrieve less data than is available. > > > > > > Best, > > > Jonah > > > > > > On Mon, Sep 22, 2025 at 12:31 PM José Armando García Sancio > > > <[email protected]> wrote: > > > > > > > Hi Jonah, > > > > > > > > Thanks for the changes. The proposed solution looks good to me at a > > > > high-level. I just have some minor comments. > > > > > > > > The motivation section starts with the solution without motivating > the > > > > problem. I think you want to delete the first paragraph in the > > > > motivation section. > > > > > > > > In the motivation section you have "non-terminating loop where they > > > > cannot join the cluster." They are technically not joining the > > > > cluster. The issue is a liveness issue. Because the FETCH_SNAPSHOT > RPC > > > > doesn't complete within the fetch timeout configuration the fetching > > > > replica cannot make progress. In the worst case, if it is a > > > > voter/controller, it further disrupts the leader by becoming a > > > > candidate and increasing the epoch. > > > > > > > > In the motivation section you have "The proposed default > > > > configurations tradeoff a reduction in the throughput of data > > > > transmitted by Fetch and FetchSnapshot requests with a liveness and > > > > correctness improvement." This is also the solution and not part of > > > > the motivation. > > > > > > > > In the public interface section, please make it clear that the > changes > > > > are the addition of two new configurations. Let's also add > > > > descriptions for these configurations. > > > > > > > > Please fill out "Compatibility, Deprecation, and Migration Plan". Why > > > > is this safe and backward compatible? > > > > > > > > In the test section you have "We can perform once off tests which > > > > evaluate the effect of degraded networking on snapshot. It's better > to > > > > design system-tests which focus on correctness by using > pathologically > > > > low values for controller.quorum.fetch.snapshot.max.bytes and > > > > controller.quorum.fetch.max.bytes. An example scenario would be a > > > > controller joining a quorum with a snapshot with a known size and > then > > > > attempting to fetch the snapshot in small (say 64kb) or even > > > > pathologically small values like 1 byte." I don't fully understand > > > > this recommendation but if my understanding is correct this sounds > > > > very complicated and flakey to implement. Can't we just add protocol > > > > tests to KafkaRaftClient*Test that show these configurations being > > > > used when handling FETC and FETCH_SNAPSHOT requests? > > > > > > > > I didn't comment on the proposed changes section. I'll comment after > > > > the comments above are addressed. > > > > > > > > Thanks > > > > -- > > > > -José > > > > > > > > > > > > -- > > -José > > >
