Steven Schlansker created KAFKA-13593:
-----------------------------------------
Summary: ThrottledChannelReaper slows broker shutdown by multiple
seconds
Key: KAFKA-13593
URL: https://issues.apache.org/jira/browse/KAFKA-13593
Project: Kafka
Issue Type: Improvement
Components: core
Affects Versions: 3.0.0
Reporter: Steven Schlansker
We run an embedded KRaft broker in integration tests, to test that our Producer
/ Consumers are all hooked up and verify our overall pipeline.
While trying to deliver CI speed improvements, we noticed that the majority of
time for a small test is actually spent in Kafka broker shutdown. From the
logs, it looks like the ClientQuotaManager has multiple ThrottledChannelReaper
threads and each of them takes up to a second to shutdown.
{code:java}
2022-01-12T15:26:31.932Z [main] INFO kafka.log.LogManager - Shutdown complete.
2022-01-12T15:26:31.934Z [main] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]:
Shutting down
2022-01-12T15:26:32.311Z [main] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]:
Shutdown completed
2022-01-12T15:26:32.311Z [ThrottledChannelReaper-Fetch] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]:
Stopped
2022-01-12T15:26:32.311Z [main] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper -
[ThrottledChannelReaper-Produce]: Shutting down
2022-01-12T15:26:33.312Z [ThrottledChannelReaper-Produce] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper -
[ThrottledChannelReaper-Produce]: Stopped
2022-01-12T15:26:33.312Z [main] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper -
[ThrottledChannelReaper-Produce]: Shutdown completed
2022-01-12T15:26:33.312Z [main] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper -
[ThrottledChannelReaper-Request]: Shutting down
2022-01-12T15:26:34.315Z [ThrottledChannelReaper-Request] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper -
[ThrottledChannelReaper-Request]: Stopped
2022-01-12T15:26:34.315Z [main] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper -
[ThrottledChannelReaper-Request]: Shutdown completed
2022-01-12T15:26:34.315Z [main] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper -
[ThrottledChannelReaper-ControllerMutation]: Shutting down
2022-01-12T15:26:35.317Z [main] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper -
[ThrottledChannelReaper-ControllerMutation]: Shutdown completed
2022-01-12T15:26:35.317Z [ThrottledChannelReaper-ControllerMutation] INFO
k.s.ClientQuotaManager$ThrottledChannelReaper -
[ThrottledChannelReaper-ControllerMutation]: Stopped{code}
Inspecting the code, the ThrottledChannelReaper threads are marked as not
interruptible, so ShutdownableThread does not interrupt the worker on shutdown.
Unfortunately, the doWork method polls with a 1 second timeout. So you wait up
to 1s for every type of quota, in this case for a total of almost 4s.
While this is not a problem for production Kafka brokers, where instances are
expected to stay up for months, in tests that expect to spin up and down it is
easily noticed and adds real overhead to CI.
Suggested possible remediations:
* Allow ThrottledChannelReaper to be interrupted by ShutdownableThread
* ThrottledChannelReaper overrides initiateShutdown and after calling
{{super.initiateShutdown}} then enqueues a {{null}} element on the delayQueue
to force worker thread to notice shutdown state
* Reduce 1s poll timeout to something small (carries overhead penalty for all
users though, so this is less desirable), or make it configurable so we can set
it to e.g. 10ms in unit tests
--
This message was sent by Atlassian Jira
(v8.20.1#820001)