Hi Sergio, Thanks for the proposal. There are certainly many corner cases to address when switching from a Green to Blue deployment to keep data active. Some points that I believe haven't been listed yet include:
- Should this reusable ProcessFunction filter out data or simply tag them? For example, filtering data for green deployments would be suitable in the case of Async IO outputs. - Should we also modify the Source API to allow it to provide COLOR to the Source functions? This could be used, for instance, to generate a different groupId for Kafka. Additionally, have you considered including a validation mechanism for Blue-Green deployment? For example, allowing users to perform automatic validation before promoting Green to Blue. After Green is deployed, we could wait for an external ConfigMap change or a property change in the FlinkBlueGreenDeployment CRD to start the promotion. Regards Oleksandr On Thu, Dec 5, 2024 at 8:32 PM Sergio Chong Loo <schong...@apple.com.invalid> wrote: > Thanks everyone for your input and encouragement! > > Yes all of these are questions that have popped up one way or another, it > won’t be easy to generalize this solution but hopefully we can devise a > good initial/extensible architecture. > > I’ll get to work on the FLIP and make sure all these points are included. > > - Sergio > > > On Dec 5, 2024, at 5:34 AM, Alexander Fedulov < > alexander.fedu...@gmail.com> wrote: > > > > Hi Sergio, > > > > Thank you for starting the discussion. Blue/Green deployment support > would > > indeed be a great feature. > > +1 on initiating the FLIP. > > > > In addition to David's points about data consistency, I think it is also > > important to mention in the FLIP that this deployment procedure will > > probably not be compatible with changes in parallelism between blue and > > green. Such changes could potentially lead to differences in watermark > > propagation and discrepancies in the produced data between the two jobs. > > > > Best, > > Alex > > > > On Wed, 4 Dec 2024 at 12:36, David Radley <david_rad...@uk.ibm.com> > wrote: > > > >> Hi Sergio, > >> +1 for starting a FLIP. > >> I am wondering how non zero values for table.exec.source.idle-timeout > and > >> table.exec.state.ttl side effects, as they are based on clock time. It > will > >> be interesting to identify these less stable scenarios and see what we > can > >> do with them. > >> > >> Kind regards, David. > >> > >> > >> > >> > >> From: Maximilian Michels <m...@apache.org> > >> Date: Wednesday, 4 December 2024 at 09:56 > >> To: schong...@apple.com.invalid <schong...@apple.com.invalid> > >> Cc: dev@flink.apache.org <dev@flink.apache.org> > >> Subject: [EXTERNAL] Re: Blue/Green Deployments support for Flink > >> Hi Sergio, > >> > >> Out of the box blue/green deployments would be a great addition to > Flink. > >> > >> +1 for starting a FLIP. That will allow us to better describe the > >> architecture and flesh out the technical details. I reckon the handover > >> between the two pipelines is going to be the most difficult part. > >> Particularly, how to ensure that both pipelines are in sync with > respect to > >> the point of handover. > >> > >> -Max > >> > >> On Tue, Dec 3, 2024 at 11:32 PM Sergio Chong Loo > >> <schong...@apple.com.invalid> wrote: > >> > >>> Hi Danny… and community, > >>> > >>> At this point would it make sense for me to begin putting a FLIP > together > >>> with more details and perhaps continue the conversation from there? > >>> > >>> Thanks, > >>> Sergio > >>> > >>>> On Nov 30, 2024, at 7:29 AM, Sergio Chong Loo <schong...@apple.com> > >>> wrote: > >>>> > >>>> Hey Danny > >>>> > >>>> Thanks for digging deeper into this topic! > >>>> > >>>> Indeed we’ve been giving a thought to most of these points but you’ve > >>> raised a couple of interesting ones, here are some ideas (inline): > >>>> > >>>> > >>>>> On Nov 27, 2024, at 4:00 AM, Danny Cranmer <dannycran...@apache.org> > >>> wrote: > >>>>> > >>>>> Hello Sergio, > >>>>> > >>>>> Thankyou for starting this discussion, I have a few questions. > >>>>> > >>>>>> having 2 identical pipelines running side-by-side > >>>>> How do you ensure correctness between the 2 identical pipelines? For > >>> example, processing time semantics or late data can result in different > >>> outputs. > >>>>> Some Sources have consumption quotas, for example Kinesis Data > >> Streams, > >>> this may end up eating into this quota and cause problems. > >>>>> How do we handle sources like Queues when they cannot be consumed > >>> concurrently? > >>>> > >>>> Yes for this analysis we’ve been staying away from Processing Time > >> since > >>> it’s (even for a single pipeline) to even “replay” idempotently. The > most > >>> stable scenario so far has been Event Time with Watermarks (the > solution > >>> will be extensible to accommodate other non-Watermark scenarios). > >>>> > >>>> The Blue deployment should start from Green's most recent Checkpoint > >>> that way we minimize the amount of time it needs to “catch up”, with > >> Event > >>> Time is easier to ensure that catch up portion will be replayed > virtually > >>> the same way as Green’s. > >>>> > >>>> Since start up times, checkpointing intervals and overall the nature > of > >>> the data are strictly Business specific, the user should have a > >>> configurable way (in this case via a “future” Watermark value) to > >> indicate > >>> when the transition between blue/green will occur. In other words, both > >>> pipelines have the same configured “future” Watermark value to > >> transition, > >>> both pipelines “see” the exact same events/records, therefore the > >>> Blue/Green Gates can both start/stop the record emission as soon as the > >>> configured Watermark is reached… they’re mutually exclusive so the > >> records > >>> should pass through one gate or another. > >>>> > >>>> I don’t have experience yet with sources that cannot be consumed > >>> concurrently, this will be a good one to analyze. > >>>> > >>>>> > >>>>>> we explore the idea of empowering the pipeline to decide, at the > >>> record level, what data goes through and what doesn’t (by means of a > >> “Gate” > >>> component). > >>>>> What happens if the Green job gets ahead of the Blue job? How will > you > >>> pick a stopping point (per channel) to stop at consistently (checkpoint > >>> barriers solve this already, right?). If you use Point in (event) Time, > >> you > >>> need to handle idle channels. > >>>> > >>>> Hopefully the answer above addresses this, if not I’m happy to add > more > >>> clarification. > >>>> > >>>>> > >>>>>> This new Controller will work in conjunction with a custom Flink > >>> Process Function > >>>>> Would this rely on the user including the operator in the job graph, > >> or > >>> would you somehow dynamically inject it? > >>>>> Where would you put this operator in complex Job graphs or forests? > >>>> > >>>> Yes, initially an idea is to have a reusable ProcessFunction that the > >>> user can place in their pipelines wherever they see fit. So far it > seems > >>> like a good idea to place it towards the end of the pipeline, e.g. > >> before a > >>> sink, that way the majority of the state of that job can be exploited > and > >>> preserved until we absolutely know we can tear the old Green pipeline > >> down… > >>> or rollback. > >>>> > >>>> Another idea, but it would be more intrusive and harder to test in an > >>> initial iteration, is to add this as base Sink functionality; this way > >> the > >>> sink could know whether to write the record(s) or not. > >>>> > >>>>> > >>>>>> A ConfigMap will be used to both contain the deployment parameters > >> as > >>> well as act as a communication conduit between the Controller and the > two > >>> Jobs. > >>>>> Will a conflig map write/read cycle be fast enough? If you write into > >>> the CM "stop at record x" then the other Job might read past record x > >>> before it sees the command > >>>> > >>>> Oh for sure this can happen. That’s why the user should have the > >>> capability of defining the custom “future” cutover point, because the > >> time > >>> span their pipeline needs will be entirely business specific. For > safety, > >>> our ProcessFunction could even enforce a minimum default “time to wait” > >> or > >>> “minimum Watermark transition value” from the moment the pipeline > started > >>> processing records or something similar. > >>>> > >>>>> > >>>>>> This function will listen for changes to the above mentioned > >>> ConfigMap and each job will know if it’s meant to be Active or Standby, > >> at > >>> record level. > >>>>> How will this work with exactly once sinks when you might have open > >>> transactions? > >>>> > >>>> The same principle remains, records should be written to either Blue’s > >>> transaction or to Green’s transaction. > >>>> > >>>> Another way to illustrate these components is: > >>>> > >>>> - Once a transition is initiated, Green’s Gate PF becomes StandBy > >>> and Blue’s Gate PF becomes Active, however they gradually open or close > >>> (mutually exclusive) in this fashion: > >>>> > >>>> - Green’s Gate will taper off the emission of records as > >>> soon as each subtask’s Watermark crosses the configured transition > value > >>>> - Blue’s Gate will gradually start emitting records as > >>> soon as each subtask’s Watermark crosses the configured transition > value > >>>> > >>>>> > >>>>> Thanks, > >>>>> Danny > >>>>> > >>>>> On Mon, Nov 25, 2024 at 6:25 PM Sergio Chong Loo > >>> <schong...@apple.com.invalid> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> As part of our work with Flink, we identified the need for a > solution > >>> to have minimal “downtime” when re-deploying pipelines. This is > >> especially > >>> the case when the startup times are high and lead to downtime that’s > >>> unacceptable for certain workloads. > >>>>>> > >>>>>> One solution we thought about is the notion of Blue/Green > deployments > >>> which I conceptually describe in this email along with a potential list > >> of > >>> modifications, the majority within the Flink K8s Operator. > >>>>>> > >>>>>> This is open for discussion, we’d love to get feedback from the > >>> community. > >>>>>> > >>>>>> Thanks! > >>>>>> Sergio Chong > >>>>>> > >>>>>> > >>>>>> Proposal to add Blue/Green Deployments support to the Flink K8s > >>> Operator > >>>>>> > >>>>>> > >>>>>> > >>>>>> Problem Statement > >>>>>> > >>>>>> As stateful applications, Flink pipelines require the data stream > >> flow > >>> to be halted during deployment operations. For some cases stopping this > >>> data flow could have a negative impact on downstream consumers. The > >>> following figure helps illustrate this: > >>>>>> > >>>>>> A Blue/Green deployment architecture, or Active/StandBy, can help > >>> overcome this issue by having 2 identical pipelines running > side-by-side > >>> with only one (the Active) producing or outputting records at all > times. > >>>>>> > >>>>>> > >>>>>> > >>>>>> Moreover, since the pipelines are the ones “handling” each and every > >>> record, we explore the idea of empowering the pipeline to decide, at > the > >>> record level, what data goes through and what doesn’t (by means of a > >> “Gate” > >>> component). > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> Proposed Modifications > >>>>>> > >>>>>> Give the the Flink K8s Operator the capability of managing the > >>> lifecycle of these deployments. > >>>>>> A new CRD (e.g. FlinkBlueGreenDeployment) will be introduced > >>>>>> A new Controller (e.g. FlinkBlueGreenDeploymentController) to manage > >>> this CRD and hide from the user the details of the actual Blue/Green > >>> (Active/StandBy) jobs. > >>>>>> Delegate the lifecycle of the actual Jobs to the existing > >>> FlinkDeployment controller. > >>>>>> At all times the controller “knows” which deployment type is/will > >>> become Active, from which point in time. This implies an Event > Watermark > >>> driven implementation but with an extensible architecture to easily > >> support > >>> other use cases. > >>>>>> A ConfigMap will be used to both contain the deployment parameters > as > >>> well as act as a communication conduit between the Controller and the > two > >>> Jobs. > >>>>>> This new Controller will work in conjunction with a custom Flink > >>> Process Function (potentially called GateProcessFunction) on the > >>> pipeline/client side. This function will listen for changes to the > above > >>> mentioned ConfigMap and each job will know if it’s meant to be Active > or > >>> Standby, at record level. > >>>>>> > >>>>>> > >>>>>> > >>>> > >>> > >>> > >> > >> Unless otherwise stated above: > >> > >> IBM United Kingdom Limited > >> Registered in England and Wales with number 741598 > >> Registered office: Building C, IBM Hursley Office, Hursley Park Road, > >> Winchester, Hampshire SO21 2JN > >> > >