[ 
https://issues.apache.org/jira/browse/KAFKA-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KahnCheny updated KAFKA-12793:
------------------------------
    Attachment: 
KAFKA-12793__KIP-693,_Client-side_supports_partitioned_circuit_breaker.patch

> Client-side Circuit Breaker for Partition Write Errors
> ------------------------------------------------------
>
>                 Key: KAFKA-12793
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12793
>             Project: Kafka
>          Issue Type: New Feature
>          Components: clients
>            Reporter: KahnCheny
>            Priority: Major
>
> When Kafka is used to build data pipeline in mission critical business 
> scenarios, availability and throughput are the most important operational 
> goals that need to be maintained in presence of transient or permanent local 
> failure. One typical situation that requires Ops intervention is disk 
> failure, some partitions have long write latency caused by extremely high 
> disk utilization; since all partitions share the same buffer under the 
> current producer thread model, the buffer will be filled up quickly and 
> eventually the good partitions are impacted as well. The cluster level 
> success rate and timeout ratio will degrade until the local infrastructure 
> issue is resolved.
> One way to mitigate this issue is to add client side mechanism to short 
> circuit problematic partitions during transient failure. Similar approach is 
> applied in other distributed systems and RPC frameworks.
> We propose to add a configuration driven circuit breaking mechanism that 
> allows Kafka client to ‘mute’ partitions when certain condition is met. The 
> mechanism adds callbacks in Sender class workflow that allows to filtering 
> partitions based on certain policy.
> The client can choose proper implementation that fits a special failure 
> scenario, Client-side custom implementation of Partitioner and 
> ProducerInterceptor
> * Customize the implementation of ProducerInterceptor, and choose the 
> strategy to mute partitions.
> * Customize the implementation of Partitioner, and choose the strategy to 
> filtering partitions.
> Muting partitions have impact when the topic contains keyed message as 
> messages will be written to more than one partitions during period of 
> recovery. We believe this can be an explicit trade-off the application makes 
> between availability and message ordering.
> KIP-693: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-693%3A+Client-side+Circuit+Breaker+for+Partition+Write+Errors|https://cwiki.apache.org/confluence/display/KAFKA/KIP-693%3A+Client-side+Circuit+Breaker+for+Partition+Write+Errors]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to