Hello Chen Thank you for the KIP.
I have a few questions/thoughts: 1. Why can’t we achieve the objective without making any change at all? For example, you can designate a few brokers as your “canary brokers” where your custom "canary partitions" are situated. During rolling deployment you can choose to deploy changes to these brokers at the beginning. If the health of your canary partitions is good, you can continue ahead with the rest of deployment. Similarly, one consumer group can be called a "canary consumer group" which can consume from the "canary partitions". 2. What do you think about having a separate canary cluster where you deploy code first before deploying to production cluster. The canary cluster could receive a small portion of "shadow" production traffic or have it's own synthetic traffic. 3. Would controller broker be part of canary brokers or not? How will we test code regression in controller? Similarly how will we test code regression in transaction coordinator and consumer coordinator? — Divij Vaidya On Fri, Jan 3, 2025 at 7:41 AM Chen Zhifeng <ericzhifengc...@gmail.com> wrote: > Hi Everyone, > > Started a thread to discuss KIP-1095: Kafka Canary Isolation (link > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1095%3A+Kafka+Canary+Isolation > > > ) > > Canary isolation aims to improve Kafka service quality by reducing blast > radius of bad Kafka deployment to a small portion of traffic. > > The key solution including > 1. define canary broker (a new broker metadata - pod is introduced) > 2. define canary partition - a small portion of partitions placed on canary > brokers > 3. producer/consumer use topic metadata to route and isolate canary traffic > in canary > > With canary isolation, it's expected to detect deployment caused failure in > canary and rollback before it impact whole production. > > Regards, >