Matthias J. Sax created KAFKA-6399:
--------------------------------------
Summary: Consider reducing "max.poll.interval.ms" default for
Kafka Streams
Key: KAFKA-6399
URL: https://issues.apache.org/jira/browse/KAFKA-6399
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 1.0.0
Reporter: Matthias J. Sax
Priority: Minor
In Kafka {{0.10.2.1}} we change the default value of {{max.poll.intervall.ms}}
for Kafka Streams to {{Integer.MAX_VALUE}}. The reason was that long state
restore phases during rebalance could yield "rebalance storms" as consumers
drop out of a consumer group even if they are healthy as they didn't call
{{poll()}} during state restore phase.
In version {{0.11}} and {{1.0}} the state restore logic was improved a lot and
thus, now Kafka Streams does call {{poll()}} even during restore phase.
Therefore, we might consider setting a smaller timeout for
{{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications
(ie, targeting user code) that don't make progress any more during regular
operations.
The open question would be, what a good default might be. Maybe the actual
consumer default of 30 seconds might be sufficient. During one {{poll()}}
roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a
single batch of records. This should take way less time than 30 seconds.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)