+1 (binding) -Max
On Tue, Aug 8, 2023 at 10:56 AM Etienne Chauchot <echauc...@apache.org> wrote: > > Hi all, > > As part of Flink bylaws, binding votes for FLIP changes are active > committer votes. > > Up to now, we have only 2 binding votes. Can one of the committers/PMC > members vote on this FLIP ? > > Thanks > > Etienne > > > Le 08/08/2023 à 10:19, Etienne Chauchot a écrit : > > > > Hi Joseph, > > > > Thanks for the detailled review ! > > > > Best > > > > Etienne > > > > Le 14/07/2023 à 11:57, Prabhu Joseph a écrit : > >> *+1 (non-binding)* > >> > >> Thanks for working on this. We have seen good improvement during the cool > >> down period with this feature. > >> Below are details on the test results from one of our clusters: > >> > >> On a scale-out operation, 8 new nodes were added one by one with a gap of > >> ~30 seconds. There were 8 restarts within 4 minutes with the default > >> behaviour, > >> whereas only one with this feature (cooldown period of 4 minutes). > >> > >> The number of records processed by the job with this feature during the > >> restart window is higher (2909764), whereas it is only 1323960 with the > >> default > >> behaviour due to multiple restarts, where it spends most of the time > >> recovering, and also whatever work progressed by the tasks after the last > >> successful completed checkpoint is lost. > >> > >> Metrics Default Adaptive Scheduler Adaptive Scheduler With Cooldown Period > >> Remarks > >> NumRecordsProcessed 1323960 2909764 1. NumRecordsProcessed metric indicates > >> the difference the cool down period brings in. When the job is doing > >> multiple restarts, the task spends most of the time recovering, and the > >> progress the task made will be lost during the restart. > >> > >> 2. There is only one restart with Cool Down Period which happened when the > >> 8th node got added back. > >> > >> Job Parallelism 13 -> 20 -> 27 -> 34 -> 41 -> 48 -> 55 → 62 → 69 13 → 69 > >> NumRestarts 8 1 > >> > >> > >> > >> > >> > >> > >> > >> > >> On Wed, Jul 12, 2023 at 8:03 PM Etienne Chauchot<echauc...@apache.org> > >> wrote: > >> > >>> Hi all, > >>> > >>> I'm going on vacation tonight for 3 weeks. > >>> > >>> Even if the vote is not finished, as the implementation is rather quick > >>> and the design discussion had settled, I preferred I implementing > >>> FLIP-322 [1] to allow people to take a look while I'm off. > >>> > >>> [1]https://github.com/apache/flink/pull/22985 > >>> > >>> Best > >>> > >>> Etienne > >>> > >>> Le 12/07/2023 à 09:56, Etienne Chauchot a écrit : > >>>> Hi all, > >>>> > >>>> Would you mind casting your vote to this second vote thread (opened > >>>> after new discussions) so that the subject can move forward ? > >>>> > >>>> @David, @Chesnay, @Robert you took part to the discussions, can you > >>>> please sent your vote ? > >>>> > >>>> Thank you very much > >>>> > >>>> Best > >>>> > >>>> Etienne > >>>> > >>>> Le 06/07/2023 à 13:02, Etienne Chauchot a écrit : > >>>>> Hi all, > >>>>> > >>>>> Thanks for your feedback about the FLIP-322: Cooldown period for > >>>>> adaptive scheduler [1]. > >>>>> > >>>>> This FLIP was discussed in [2]. > >>>>> > >>>>> I'd like to start a vote for it. The vote will be open for at least 72 > >>>>> hours (until July 9th 15:00 GMT) unless there is an objection or > >>>>> insufficient votes. > >>>>> > >>>>> [1] > >>>>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-322+Cooldown+period+for+adaptive+scheduler > >>>>> [2]https://lists.apache.org/thread/qvgxzhbp9rhlsqrybxdy51h05zwxfns6 > >>>>> > >>>>> Best, > >>>>> > >>>>> Etienne