Thanks for the ideas and the offer to help out Ryan! It’s invaluable to learn about how other users/teams scenarios.
Indeed I have to pursue and evaluate the GateInjectorExecutor nonetheless for our internal development. Ideally it’d be great if the user can simply “invoke” the functionality, even give the user an option to specify “where" the gating mechanism to be placed (e.g. right after the source or before sink), or for the most flexibility they can incorporate and place the Gate manually just like it is now. I’ll share the progress asap and we can all take it from there (this should not exceed a couple weeks). I’ll definitely need more of your feedback to verify this with Flink SQL. Thanks again, Sergio > On Mar 16, 2026, at 7:01 AM, Ryan van Huuksloot > <[email protected]> wrote: > > Hi Sergio, > > re: 1.1 > My thought is that a BlueGreen Mixin isn't Kubernetes specific and could be > reused by other deployment control planes. However, I do agree that attaching > it to the sink has other implications so I am happy to pivot if we can find > an alternative solution. > > re: 2 > I'm happy to leave it out of the Phase 2 implementation, but I think it > should be possible. For example we use Phase 1 with cross cluster migrations > today. Phase 2 within a single cluster isn't particularly useful for us. > > re: GateInjectorExecutor > This sounds like a neat idea. I need to read more about how it would work but > from a high level, injecting an operator before your sinks sounds like a good > idea. Better isolation, possible with SQL, no mixins, etc. > > I will mention that part of the reason I want it before the sinks is because > nine out of ten people building pipelines struggle to understand where their > state is and how Phase 2 would affect the correctness of their state > depending on where they put the gate. I understand that if you have a remote > lookup and want to save bandwidth, you could optimize your pipeline by moving > the gate before the remote call; however, that seems like an optimization > that can be made later. > > Thanks for driving this! Let me know how we can help. > > Ryan van Huuksloot > Staff Engineer, Infrastructure | Streaming Platform > <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> > > On Mon, Mar 16, 2026 at 2:21 AM Sergio Chong Loo <[email protected] > <mailto:[email protected]>> wrote: >> Hi Ryan >> >> Thanks a lot for these details. For sure some of these observations popped >> up during our initial discussions, and that’s why our initial goal was to >> introduce this as simple as possible and gradually enhance it to cover gaps. >> >> Allow me to address your concerns: >> I’m happy you stressed the point of “disruption to existing pipelines”. >> However, there’s a few points about attempting to build this functionality >> into the sinks (or sources) right off the bat (read further below for my >> alternative): >> Kubernetes centric: as of now the Blue/Green Deployments support is a >> Kubernetes specific solution, adding a mixin directly available to sinks >> would “leak” this support outside of K8s >> A sink being aware of these deployment phases violates single >> responsibility, but more importantly… >> Flink currently has many connectors, with the majority being maintained >> outside of the Flink code base, by separate teams, separate repos, separate >> release cycles. This would complicate things significantly as to try and add >> support for this for every potential flink connector project out there would >> be a cumbersome. Blue/Green Phase 2 then only would works with "gate-aware" >> sinks. >> I’d leave the conversation about migrating jobs between K8s clusters outside >> of this scope, even Phase 1 is meant to only work in a single cluster… >> Watermarking, excellent point, it’s indeed a requirement so I’ll make sure >> this is validated where applicable (by the concrete implementation) >> >> Having said what I said about point 1.1 above, I’m currently working on an >> approach which uses a “GateInjectorPipelineExecutor” so to speak; in other >> words a custom PipelineExecutor that would be shipped with the K8s Operator, >> invoked by Flink Configuration (via “execution.target:”). This custom piece >> would instantiate and inject the Gate at a fixed point in the StreamGraph >> right before job submission. I still have to validate and ensure a few >> things are correctly taken care of (like Type Information, etc.) but the >> theory looks promising. >> >> For the most part this works well with Flink SQL (same configuration), >> here’s my estimation: >> >> tEnv.executeSql("INSERT INTO my_sink ...") >> └─> SQL planner → ExecNodeGraph → Transformation[] >> └─> StreamGraph >> └─> GateInjectorExecutor injects GateProcessFunction >> └─> StreamGraph' (mutated) → JobGraph >> └─> Submit Job >> >> I’m aiming to share some updates along these lines in the next few weeks but >> hopefully this falls inline with your objectives/thoughts overall. >> >> Sergio >> >> >>> On Mar 6, 2026, at 3:36 PM, Ryan van Huuksloot via dev >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> Hi Sergio, >>> Thanks for starting this conversation. >>> >>> A few thoughts regarding BlueGreen Phase 2: >>> 1. The Gate Operator is interesting but I don't like that we would have to >>> modify users' pipelines for them to use Phase 2. This gate function seems >>> like it could be a Mixin that connectors would implement. If you want to >>> use Phase 2, your sinks must implement this Mixin. I understand that a >>> unique GateFunction has pros, but it works less well with FlinkSQL - and >>> the trade-off doesn't seem worthwhile. >>> 2. Regarding the ConfigMap. We should consider a solution that supports >>> migrating Flink jobs between Kubernetes clusters. Otherwise Phase 2 is only >>> useful for in cluster operations. >>> 3. Watermarking is a requirement. Will the Flink Kubernetes Operator >>> validate that the pipeline is using watermarks? >>> >>>> What happens when idleness is configured? Watermarks will get ignored from >>> these “slow” subtasks and advance, could records from the ignored subtasks >>> eventually be lost? >>> Yes they would be lost, but that would happen irrespective of Phase 2. >>> >>> I'll have more thoughts after we discuss the Gate Operator, as that is >>> crucial to the FLIP right now. >>> >>> Ryan van Huuksloot >>> Staff Engineer, Infrastructure | Streaming Platform >>> [image: Shopify] >>> <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> >>> >>> >>> On Mon, Mar 2, 2026 at 6:52 PM Sergio Chong Loo <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>>> Bumping this (Advanced Blue/Green deployments - FLIP-504) thread after >>>> making some code adjustments. >>>> >>>> FYI @drossos <https://github.com/drossos> @ryanvanhuuksloot < >>>> https://github.com/ryanvanhuuksloot> I’d like to get your feedback since >>>> I know you’re interested in this feature. >>>> >>>> Thanks, >>>> - Sergio >>>> >>>> >>>>> On Dec 5, 2025, at 2:31 PM, Sergio Chong Loo <[email protected] >>>>> <mailto:[email protected]>> >>>> wrote: >>>>> >>>>> Hi folks, >>>>> >>>>> FLIP-503 (already merged) introduced the Basic Blue/Green Deployment >>>> functionality to the Flink K8s Operator. It was very straightforward, >>>> simply transitioning to the second deployment once it's considered stable. >>>>> >>>>> FLIP-504 is an Advanced version added on top of 503 and brings about the >>>> notion of "record-level" coordination between the 2 deployments to have no >>>> data duplication and exactly once semantics while preserving a smooth >>>> transition. >>>>> >>>>> The main goals are: >>>>> • For the community to take a quick look at the current >>>> functionality (previously mentioned at the Flink Forward 2025 Conference) >>>>> • To get feedback and improvement suggestions >>>>> >>>>> Flip 504 details: >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337677650 >>>>> >>>>> Draft PR: https://github.com/apache/flink-kubernetes-operator/pull/1043 >>>>> >>>>> Thank you! >>>>> - Sergio >>>>> >>>>> >>>> >>>> >>
