Re: [DISCUSS] KIP-1095: Kafka Canary Isolation

Divij Vaidya Fri, 03 Jan 2025 07:42:36 -0800

Hello Chen

Thank you for the KIP.

I have a few questions/thoughts:

1. Why can’t we achieve the objective without making any change at all? For
example, you can designate a few brokers as your “canary brokers” where
your custom "canary partitions" are situated. During rolling deployment you
can choose to deploy changes to these brokers at the beginning. If the
health of your canary partitions is good, you can continue ahead with the
rest of deployment. Similarly, one consumer group can be called a "canary
consumer group" which can consume from the "canary partitions".

2. What do you think about having a separate canary cluster where you
deploy code first before deploying to production cluster. The canary
cluster could receive a small portion of "shadow" production traffic or
have it's own synthetic traffic.

3. Would controller broker be part of canary brokers or not? How will we
test code regression in controller? Similarly how will we test code
regression in transaction coordinator and consumer coordinator?

—

Divij Vaidya

On Fri, Jan 3, 2025 at 7:41 AM Chen Zhifeng <ericzhifengc...@gmail.com>
wrote:

> Hi Everyone,
>
> Started a thread to discuss KIP-1095: Kafka Canary Isolation (link
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1095%3A+Kafka+Canary+Isolation
> >
> )
>
> Canary isolation aims to improve Kafka service quality by reducing blast
> radius of bad Kafka deployment to a small portion of traffic.
>
> The key solution including
> 1. define canary broker (a new broker metadata - pod is introduced)
> 2. define canary partition - a small portion of partitions placed on canary
> brokers
> 3. producer/consumer use topic metadata to route and isolate canary traffic
> in canary
>
> With canary isolation, it's expected to detect deployment caused failure in
> canary and rollback before it impact whole production.
>
> Regards,
>

Re: [DISCUSS] KIP-1095: Kafka Canary Isolation

Reply via email to