Draft PR)

Jing-Jia Hung Sun, 21 Jun 2026 14:57:55 -0700

Hi Sergio,

Late to the thread. The gate auto injection via the Java agent is cool.


Most of the discussion so far has centered on data-plane correctness. The
area I'd like to understand better is the control-plane side, specifically
recovery when a transition doesn't complete. The FLIP notes the controller
can be left in a bad state if a deployment fails mid-transition. From
reading the draft it looks like the teardown step waits on the gate
signaling CLEAR_TO_TEARDOWN. If the gate stalls (watermark never advances,
or the job fails before it signals), how do you picture the transition
unwinding? Is there a deadline that forces a rollback, or is that part of
the error handling still being worked out?

Also, I wonder how this approach would work with Pyflink DataStream jobs.
My understanding is that Java agent may not be able to inject the gate in
that case. Curious to hear your thoughts.

Thank you!
Jing



On 2026/05/04 17:56:58 Sergio Chong Loo wrote: > Hi Daniel / Ryan, > >
Again thanks a lot for the feedback. Here are some thoughts: > > The
following are excellent points and I indeed think we can work on them
immediately to incorporate them. I already have some ideas on
implementation but let me flesh them out more before I share them: > -
WatermarkGenerator > - Gate Parallelism > - TransitionMode default
(trivial) > > Thanks for the Misc Testing proposal (especially with SQL),
it will indeed facilitate development and make sure the outcome/output is
precise. Let me know if/when you choose to start this effort (or any other
item for that matter). > > Exactly-once sinks. This needs explicit testing
indeed. I haven't verified this one yet. I believe the gate operator should
ensure green's sink transactions don't begin until blue has completed its
final checkpoint and committed. This is a good candidate for a dedicated
test case before declaring phase 1 stable. > > Non-idempotent sinks. I'm
still not sure this is a problem meant to be solved by Blue/Green
deployments. If we think closely about it, even within a single Flink
pipeline, strict global ordering is only guaranteed within a single
subtask's partition. The moment you have parallelism > 1, different
subtasks process different keys/partitions independently and emit records
at different rates. There's no global wall-clock ordering across subtasks.
Flink's watermark mechanism gives you event-time ordering, not
arrival-order consistency across the whole topology. For pipelines where
strict ordering to a non-idempotent sink is a hard requirement, the right
pattern is a savepoint based stop/restart rather than concurrent blue/green
execution. > > On the other hand, assuming we pursue this, your proposal
sounds straightforward, but using state would imply the new pipeline needs
to buffer an unbounded amount of data while waiting for the first
deployment to finish, which makes memory management and state sizing very
difficult. The gate watermark barrier already provides a temporal ordering
guarantee (green doesn't advance past blue's watermark), which covers the
majority of idempotency-sensitive sinks. Perhaps I'm looking at this from
an overly simplistic perspective, is there a concrete example you can share
to illustrate the scenario you have in mind? We should definitely keep this
topic open as we make progress on the other items. > > I'll keep you posted
on progress for the items at the top. > > ⁃ Sergio > > > > On Apr 20, 2026,
at 8:40 AM, Sergio Chong Loo <[email protected]> wrote: > > > > Hi @Daniel /
@Ryan, > > > > Thanks a lot for the input. Similarly we’re swamped in a
time crunch here but I’ll be taking a deep dive into your feedback
hopefully before EoW. > > > > Stay tuned! > > > > - Sergio > > > >> On Apr
15, 2026, at 3:57 PM, Daniel Rossos <[email protected]> wrote: > >> > >>
Hey Sergio, > >> > >> I got around to running locally as well as doing a
deeper dive into the current implementation details (Sorry about the
delay). This all looks super awesome and huge thanks for taking the time to
put this together. I have some comments / questions below. Some I think
will be answered as we test further and some are potential next steps /
features that might be nice for phase 2. > >> > >> Non-Idempotent sinks >
>> I believe we will still have non-idempotent sinks issues by using these
gate functions. Brief recap of when this was raised before, records in the
“green” deployment could be produced before the “blue” deployment causing
an out-of-order delivery in record processing order which could have
implications for downstream sinks. One potential solution I was thinking
about that fits into this PR was to have the gate-operator be stateful and
have your “green” pipeline accumulate messages after watermark barrier and
wait for the “blue” to communicate (via the configmap) that it is done
processing before passing its records on. This would introduce a minimal
processing delay (poll rate of the “green” on configmap), but would ensure
that the ordering of the streams to down stream sinks remains consistent.
Other complications arise here with regards to handling state, but want to
get your opinion here. > >> > >> WatermarkGenerator > >> Inserting a
watermark generator instead of requiring watermark in data. On the SQL side
of things I noticed it is required that a field be a timestamp field that
can be converted into a watermark. I was wondering if an alternative would
be to inject a watermark generator step (generate based off some other
value), that way existing table defs won’t need to be changed to
accommodate this new feature. > >> > >> Exactly-once-sinks > >> How does
this work with exactly_once downstream sinks compatibility wise? For
example we should test using exactly_once kafka sinks to see if there will
be any conflicts there. > >> > >> Gate Parallelism > >> From my
understanding, the parallelism of the gate operator has to be same as the
sink/source operator it is tied to. With autoscaling how do these stay in
sync? Could this cause problems? Can this gate become a performance
bottleneck somehow? > >> > >> TransitionMode Default > >> `transitionMode`
not being set to default `BASIC` means all phase 1 Blue-Green deployment
would be broken specs on upgrade. I think we will want to include that
default > >> > >> Misc Testing > >> This is something for the future (and
something I might play around with as I test), is adding a case to the
e2e-tests that creates a Flink pipeline (sql and/or non-sql) and checks
output from sink to ensure there are no duplicates. I don’t have a
convenient pipeline on hand > >> > >> Going forward, I am going to try to
get a good test pipeline created and test this on our sandbox environment
to see if I can get a good e2e run on prod-like conditions. > >> > >>
Thanks again, > >> > >> Daniel > >> > >> On Wed, Apr 1, 2026 at 12:25 PM
Sergio Chong Loo <[email protected] <[email protected]>> wrote: > >>>
Absolutely no rush, take your time. > >>> > >>> I’m also still working on
some details around error handling. I’ll reach out offline so you can
always work on the latest. > >>> > >>> Thank you both as well for the
feedback! > >>> > >>> - Sergio > >>> > >>> > >>>> On Apr 1, 2026, at
8:16 AM, Daniel Rossos <[email protected] <[email protected]>>
wrote: > >>>> > >>>> Hi Sergio, > >>>> > >>>> Sorry about the delay on my
end (just got back after a few weeks off). I have caught up with + agree
with all the feedback Ryan provided in this thread. > >>>> > >>>> The Gate
auto-injection idea is really cool. I'm going to try and make some time
this week to test out your PR on my side and provide feedback and thoughts.
> >>>> > >>>> Thanks again for driving this, > >>>> Daniel > >>>> > >>>> >
>>>> On Thu, Mar 26, 2026 at 11:13 AM Ryan van Huuksloot via dev <
[email protected] <[email protected]>> wrote: > >>>>> Awesome!
Thanks for the update, Sergio. I'm excited to see the plan - it is > >>>>>
a cool idea so I'm glad it is working. > >>>>> > >>>>> Ryan van Huuksloot >
>>>>> Staff Engineer, Infrastructure | Streaming Platform > >>>>> [image:
Shopify] > >>>>> <
https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> >
>>>>> > >>>>> > >>>>> On Thu, Mar 26, 2026 at 1:46 AM Sergio Chong Loo <
[email protected] <[email protected]>> > >>>>> wrote: > >>>>> > >>>>> >
@Ryan / @Daniel, > >>>>> > > >>>>> > Good news! The
“GateInjectorPipelineExecutor” idea is successful!! > >>>>> > > >>>>> >
While the original approach of simply activating it with > >>>>> >
“execution.target” did not quite work, I was able to implement it via the
*Instrumentation > >>>>> > API with a Java Agent* that injects it… the user
doesn’t have to touch > >>>>> > their pipelines and I added 2 options, at
least for now, to place/inject > >>>>> > the Gate after the source or
before the sink (complex DAG cases with > >>>>> > multiple sources or sinks
for now are not supported). > >>>>> > > >>>>> > I’m documenting everything
and prepping the Draft PR for your review, > >>>>> > probably a couple more
days. > >>>>> > > >>>>> > Thanks, stay tuned. > >>>>> > > >>>>> > - Sergio
> >>>>> > > >>>>> > > >>>>> > On Mar 16, 2026, at 3:30 PM, Sergio Chong Loo
<[email protected] <[email protected]>> wrote: > >>>>> > > >>>>> > Thanks
for the ideas and the offer to help out Ryan! It’s invaluable to > >>>>> >
learn about how other users/teams scenarios. > >>>>> > > >>>>> > Indeed I
have to pursue and evaluate the GateInjectorExecutor nonetheless > >>>>> >
for our internal development. Ideally it’d be great if the user can simply
> >>>>> > “invoke” the functionality, even give the user an option to
specify “where" > >>>>> > the gating mechanism to be placed (e.g. right
after the source or before > >>>>> > sink), or for the most flexibility
they can incorporate and place the Gate > >>>>> > manually just like it is
now. > >>>>> > > >>>>> > I’ll share the progress asap and we can all take
it from there (this > >>>>> > should not exceed a couple weeks). I’ll
definitely need more of your > >>>>> > feedback to verify this with Flink
SQL. > >>>>> > > >>>>> > Thanks again, > >>>>> > Sergio > >>>>> > > >>>>> >
> >>>>> > On Mar 16, 2026, at 7:01 AM, Ryan van Huuksloot < > >>>>> >
[email protected] <[email protected]>> wrote: > >>>>> > > >>>>>
> Hi Sergio, > >>>>> > > >>>>> > re: 1.1 > >>>>> > My thought is that a
BlueGreen Mixin isn't Kubernetes specific and could > >>>>> > be reused by
other deployment control planes. However, I do agree that > >>>>> >
attaching it to the sink has other implications so I am happy to pivot if >
>>>>> > we can find an alternative solution. > >>>>> > > >>>>> > re: 2 >
>>>>> > I'm happy to leave it out of the Phase 2 implementation, but I
think it > >>>>> > should be possible. For example we use Phase 1 with
cross cluster > >>>>> > migrations today. Phase 2 within a single cluster
isn't particularly useful > >>>>> > for us. > >>>>> > > >>>>> > re:
GateInjectorExecutor > >>>>> > This sounds like a neat idea. I need to read
more about how it would work > >>>>> > but from a high level, injecting an
operator before your sinks sounds like > >>>>> > a good idea. Better
isolation, possible with SQL, no mixins, etc. > >>>>> > > >>>>> > I will
mention that part of the reason I want it before the sinks is > >>>>> >
because nine out of ten people building pipelines struggle to understand >
>>>>> > where their state is and how Phase 2 would affect the correctness
of their > >>>>> > state depending on where they put the gate. I understand
that if you have a > >>>>> > remote lookup and want to save bandwidth, you
could optimize your pipeline > >>>>> > by moving the gate before the remote
call; however, that seems like an > >>>>> > optimization that can be made
later. > >>>>> > > >>>>> > Thanks for driving this! Let me know how we can
help. > >>>>> > > >>>>> > Ryan van Huuksloot > >>>>> > Staff Engineer,
Infrastructure | Streaming Platform > >>>>> > [image: Shopify] > >>>>> > <
https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> >
>>>>> > > >>>>> > > >>>>> > On Mon, Mar 16, 2026 at 2:21 AM Sergio Chong
Loo <[email protected] <[email protected]>> > >>>>> > wrote: > >>>>> > >
>>>>> >> Hi Ryan > >>>>> >> > >>>>> >> Thanks a lot for these details. For
sure some of these observations > >>>>> >> popped up during our initial
discussions, and that’s why our initial goal > >>>>> >> was to introduce
this as simple as possible and gradually enhance it to > >>>>> >> cover
gaps. > >>>>> >> > >>>>> >> Allow me to address your concerns: > >>>>> >> >
>>>>> >> 1. I’m happy you stressed the point of “disruption to existing >
>>>>> >> pipelines”. However, there’s a few points about attempting to
build this > >>>>> >> functionality into the sinks (or sources) right off
the bat (read further > >>>>> >> below for my alternative): > >>>>> >> 1.
Kubernetes centric: as of now the Blue/Green Deployments > >>>>> >> support
is a Kubernetes specific solution, adding a mixin directly > >>>>> >>
available to sinks would “leak” this support outside of K8s > >>>>> >> 2. A
sink being aware of these deployment phases violates single > >>>>> >>
responsibility, but more importantly… > >>>>> >> 3. Flink currently has
many connectors, with the majority being > >>>>> >> maintained outside of
the Flink code base, by separate teams, separate > >>>>> >> repos, separate
release cycles. This would complicate things significantly > >>>>> >> as to
try and add support for this for every potential flink connector > >>>>> >>
project out there would be a cumbersome. Blue/Green Phase 2 then only would
> >>>>> >> works with "gate-aware" sinks. > >>>>> >> 2. I’d leave the
conversation about migrating jobs between K8s > >>>>> >> clusters outside
of this scope, even Phase 1 is meant to only work in a > >>>>> >> single
cluster… > >>>>> >> 3. Watermarking, excellent point, it’s indeed a
requirement so I’ll > >>>>> >> make sure this is validated where applicable
(by the concrete > >>>>> >> implementation) > >>>>> >> > >>>>> >> > >>>>>
>> Having said what I said about point 1.1 above, I’m currently working on
> >>>>> >> an approach which uses a “GateInjectorPipelineExecutor” so to
speak; in > >>>>> >> other words a custom PipelineExecutor that would be
shipped with the K8s > >>>>> >> Operator, invoked by Flink Configuration
(via “execution.target:”). This > >>>>> >> custom piece would instantiate
and inject the Gate at a fixed point in the > >>>>> >> StreamGraph right
before job submission. I still have to validate and > >>>>> >> ensure a few
things are correctly taken care of (like Type Information, > >>>>> >> etc.)
but the theory looks promising. > >>>>> >> > >>>>> >> For the most part
this works well with Flink SQL (same configuration), > >>>>> >> here’s my
estimation: > >>>>> >> > >>>>> >> tEnv.executeSql("INSERT INTO my_sink
...") > >>>>> >> └─> SQL planner → ExecNodeGraph → Transformation[] > >>>>>
>> └─> StreamGraph > >>>>> >> └─> GateInjectorExecutor injects
GateProcessFunction > >>>>> >> └─> StreamGraph' (mutated) → JobGraph >
>>>>> >> └─> Submit Job > >>>>> >> > >>>>> >> I’m aiming to share some
updates along these lines in the next few weeks > >>>>> >> but hopefully
this falls inline with your objectives/thoughts overall. > >>>>> >> > >>>>>
>> Sergio > >>>>> >> > >>>>> >> > >>>>> >> On Mar 6, 2026, at 3:36 PM, Ryan
van Huuksloot via dev < > >>>>> >> [email protected] <
[email protected]>> wrote: > >>>>> >> > >>>>> >> Hi Sergio, > >>>>> >>
Thanks for starting this conversation. > >>>>> >> > >>>>> >> A few thoughts
regarding BlueGreen Phase 2: > >>>>> >> 1. The Gate Operator is interesting
but I don't like that we would have to > >>>>> >> modify users' pipelines
for them to use Phase 2. This gate function seems > >>>>> >> like it could
be a Mixin that connectors would implement. If you want to > >>>>> >> use
Phase 2, your sinks must implement this Mixin. I understand that a > >>>>>
>> unique GateFunction has pros, but it works less well with FlinkSQL - and
> >>>>> >> the trade-off doesn't seem worthwhile. > >>>>> >> 2. Regarding
the ConfigMap. We should consider a solution that supports > >>>>> >>
migrating Flink jobs between Kubernetes clusters. Otherwise Phase 2 is >
>>>>> >> only > >>>>> >> useful for in cluster operations. > >>>>> >> 3.
Watermarking is a requirement. Will the Flink Kubernetes Operator > >>>>>
>> validate that the pipeline is using watermarks? > >>>>> >> > >>>>> >>
What happens when idleness is configured? Watermarks will get ignored from
> >>>>> >> > >>>>> >> these “slow” subtasks and advance, could records from
the ignored subtasks > >>>>> >> eventually be lost? > >>>>> >> Yes they
would be lost, but that wo [message truncated...]

RE: Re: FLIP-504: Blue/Green Deployments for Flink on Kubernetes - Phase 2 (WIP/Draft PR)

Reply via email to