[ 
https://issues.apache.org/jira/browse/KAFKA-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040177#comment-17040177
 ] 

John Roesler commented on KAFKA-9274:
-------------------------------------

Thanks for the PRs, [~mjsax]!

I think Github is having some issues right now; I can't leave any comments on 
your PR, but I have a high-level question:

Maybe you can help me understand the impact of re-using the clients' `retries` 
configuration. Let's say I want to put Streams in "high resilience" mode, so I 
set `retries` to a large number, like 1,000. It seems like we would also pass 
this configuration to the clients, causing them to internally retry each call 
1,000 times as well, right?

If so, this would then cause any client call to potentially take quite a long 
time, potentially preventing the thread from calling `poll` in time. So, there 
would be a dependency between `retries` and `max.poll.interval.ms`.

Additionally, the internal retries would make the clients more likely to time 
out, which would then cause Streams to retry in its loop, effectively making 
the clients try each operation up to `retries^2` times. This might not have a 
negative practical effect, but it might be surprising. It does imply a 
dependency between `retries` and `request.timeout.ms`.

It seems like an operator could "un-set" the config for clients by adding 
prefixed configs to set the clients back to their defaults:
{quote}{{retries: 1000}}
{{ producer.retries: 1}}
{{ admin.retries: 2147483647}}
{quote}
But this seems too esoteric.

What do you think about introducing a new configuration instead, to prevent 
interference between the Streams retry and the Clients' retries?

> Gracefully handle timeout exceptions on Kafka Streams
> -----------------------------------------------------
>
>                 Key: KAFKA-9274
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9274
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Boyang Chen
>            Assignee: Matthias J. Sax
>            Priority: Major
>              Labels: kip
>
> Right now streams don't treat timeout exception as retriable in general by 
> throwing it to the application level. If not handled by the user, this would 
> kill the stream thread unfortunately.
> In fact, timeouts happen mostly due to network issue or server side 
> unavailability. Hard failure on client seems to be an over-kill.
> We would like to discuss what's the best practice to handle timeout 
> exceptions on Streams. The current state is still brainstorming and 
> consolidate all the cases that contain timeout exception within this ticket.
> This ticket is backed by KIP-572: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-572%3A+Improve+timeouts+and+retries+in+Kafka+Streams]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to