[
https://issues.apache.org/jira/browse/KAFKA-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
KahnCheny updated KAFKA-12793:
------------------------------
Summary: Client-side Circuit Breaker for Partition Write Errors (was:
KIP-693 Client-side Circuit Breaker for Partition Write Errors)
> Client-side Circuit Breaker for Partition Write Errors
> ------------------------------------------------------
>
> Key: KAFKA-12793
> URL: https://issues.apache.org/jira/browse/KAFKA-12793
> Project: Kafka
> Issue Type: New Feature
> Components: clients
> Reporter: KahnCheny
> Priority: Major
>
> When Kafka is used to build data pipeline in mission critical business
> scenarios, availability and throughput are the most important operational
> goals that need to be maintained in presence of transient or permanent local
> failure. One typical situation that requires Ops intervention is disk
> failure, some partitions have long write latency caused by extremely high
> disk utilization; since all partitions share the same buffer under the
> current producer thread model, the buffer will be filled up quickly and
> eventually the good partitions are impacted as well. The cluster level
> success rate and timeout ratio will degrade until the local infrastructure
> issue is resolved.
> One way to mitigate this issue is to add client side mechanism to short
> circuit problematic partitions during transient failure. Similar approach is
> applied in other distributed systems and RPC frameworks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)