Jay Kreps created KAFKA-1441: -------------------------------- Summary: Purgatory purge causes latency spikes Key: KAFKA-1441 URL: https://issues.apache.org/jira/browse/KAFKA-1441 Project: Kafka Issue Type: Bug Reporter: Jay Kreps
The request purgatory has a funky thing where it periodically loops over all watches and purges them. If you have a fair number of partitions you can accumulate lots of watches and purging them can take a long time. During this time all expiry is halted. Here is an example log: [2014-05-08 21:07:41,950] INFO ExpiredRequestReaper-2 Expired request after 10ms: 5829 (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:41,952] INFO ExpiredRequestReaper-2 Expired request after 10ms: 5882 (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:41,967] INFO ExpiredRequestReaper-2 Expired request after 11ms: 5884 (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:41,968] INFO ExpiredRequestReaper-2 Purging purgatory (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:41,969] INFO ExpiredRequestReaper-2 Purged 0 requests from delay queue. (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:42,305] INFO ExpiredRequestReaper-2 Purged 340809 (watcher) requests. (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:42,305] INFO ExpiredRequestReaper-2 Expired request after 106ms: 5847 (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:42,305] INFO ExpiredRequestReaper-2 Expired request after 106ms: 5904 (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:42,328] INFO ExpiredRequestReaper-2 Expired request after 10ms: 5908 (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:42,329] INFO ExpiredRequestReaper-2 Expired request after 10ms: 5852 (kafka.server.RequestPurgatory$ExpiredRequestReaper) [2014-05-08 21:07:42,343] INFO ExpiredRequestReaper-2 Expired request after 11ms: 5854 (kafka.server.RequestPurgatory$ExpiredRequestReaper) Combined with our buggy purgatory request impls that can sometimes hit their expiration this can lead to huge latency spikes. -- This message was sent by Atlassian JIRA (v6.2#6252)