[ 
https://issues.apache.org/jira/browse/KAFKA-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393486#comment-16393486
 ] 

Matthias J. Sax commented on KAFKA-6399:
----------------------------------------

Thanks for the feedback. It's a tricky question and I am personally not sure 
what I prefer. My thinking is as follows: initially, we used 30 seconds what 
was too short because of store restore time. Since we set it to MAX_VALUE, I 
cannot remember any use issues related to the config. Thus, it might even be ok 
to keep the default at MAX_VALUE. If we still need MAX_VALUEis questionable 
though as we moved the restore code into the mail loop and got rid of the root 
cause that forces us to set it to MAX_VALUE. However, because I can't remember 
any issues with MAX_VALUE, even if we don't need this high value, it seems to 
work in practice. We know from some user reports, that processing time can vary 
largely, thus, even is we set it to 5 Minutes, it would bit some users if they 
don't increase the setting. Keeping MAX_VALUE would be a safe bet for this 
case. However, I am a little concerned about a bad behaving app that never 
times out if the default is MAX_VALUE: users code could loop infinitely for 
example.

Long story short: I think it boils down to the question if we either want to 
make sure the default settings are robust with regard to "make progress" or if 
the default setting should be more "error sensitive". I guess, for most cases, 
uses want/should to adjust this value anyway independently what default we 
choose (either some user need to increase or other users should decrease to 
enable error detection in the first place).

> Consider reducing "max.poll.interval.ms" default for Kafka Streams
> ------------------------------------------------------------------
>
>                 Key: KAFKA-6399
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6399
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 1.0.0
>            Reporter: Matthias J. Sax
>            Priority: Minor
>
> In Kafka {{0.10.2.1}} we change the default value of 
> {{max.poll.intervall.ms}} for Kafka Streams to {{Integer.MAX_VALUE}}. The 
> reason was that long state restore phases during rebalance could yield 
> "rebalance storms" as consumers drop out of a consumer group even if they are 
> healthy as they didn't call {{poll()}} during state restore phase.
> In version {{0.11}} and {{1.0}} the state restore logic was improved a lot 
> and thus, now Kafka Streams does call {{poll()}} even during restore phase. 
> Therefore, we might consider setting a smaller timeout for 
> {{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications 
> (ie, targeting user code) that don't make progress any more during regular 
> operations.
> The open question would be, what a good default might be. Maybe the actual 
> consumer default of 30 seconds might be sufficient. During one {{poll()}} 
> roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a 
> single batch of records. This should take way less time than 30 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to