Hi Piotr,

I also agree with Zhanghao's assessment on the limitations of unaligned
checkpoints. Some of them are already handled properly by Flink, but in the
case of the "Interplay with watermarks" limitation, it is quite confusing
for a new user to find that their code doesn't generate consistent results
with the default checkpoint configuration. Is there a way for Flink to
detect and handle this situation correctly?

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations

Best,
Mason

On Mon, Jan 8, 2024 at 2:01 AM yangpmldl <yangpm...@163.com> wrote:

> 退订
>
>
>
>
>
>
>
>
>
>
>
> At 2024-01-08 17:45:01, "Piotr Nowojski" <pnowoj...@apache.org> wrote:
> >Hi thanks for the responses,
> >
> >And thanks for pointing out the jobs upgrade issue. Indeed that has
> >slipped my mind. I was mistakenly
> >thinking that we are supporting all upgrades only via savepoint. Anyway,
> >maybe in that case we should
> >guide users towards that? Using savepoints for upgrades? That would be
> even
> >easier to understand
> >for the users:
> >- use unaligned checkpoints for checkpoints
> >- use savepoints for any changes in the job/version upgrades
> >
> >There is a downside, that savepoints are always full, while aligned
> >checkpoints can be incremental.
> >
> >WDYT?
> >
> >Regarding the value for the timeout, I would also be fine with 30s. Indeed
> >that's a safer default.
> >
> >> On a separate point, in the sentence below it seems to me it would be
> >> clearer to say that in the unlikely scenario you've described, the
> change
> >> would "significantly increase checkpoint sizes" -- assuming I understand
> >> things correctly.
> >
> >I've reworded that paragraph.
> >
> >Best,
> >Piotrek
> >
> >
> >
> >pon., 8 sty 2024 o 08:02 Rui Fan <1996fan...@gmail.com> napisał(a):
> >
> >> Thanks to Piotr driving this proposal!
> >>
> >> Enabling unaligned checkpoint with aligned checkpoints timeout
> >> is fine for me. I'm not sure if aligned checkpoints timeout =5s is
> >> too aggressive. If the unaligned checkpoint is enabled by default
> >> for all jobs, I recommend that the aligned checkpoints timeout be
> >> at least 30s.
> >>
> >> If the 30s is too big for some of the flink jobs, flink users can turn
> >> it down by themselves.
> >>
> >> To David, Ken and Zhanghao:
> >>
> >> Unaligned checkpoint indeed has some limitations than aligned
> checkpoint,
> >> but if we set aligned checkpoints timeout= 30s or 60s, it means
> >> when a job can be completed within 30s or 60s, this job still uses the
> >> aligned checkpoint (it doesn't introduce any extra effort).
> >> When the checkpoint cannot be completed within aligned checkpoints
> timeout,
> >> the aligned checkpoint will be switched to the unaligned checkpoint
> >> The unaligned checkpoint can be completed when backpressure is severe.
> >>
> >> In brief, when backpressure is low, enabling them without any effort.
> >> when backpressure is high, enabling them has some benefits.
> >>
> >> So I think it doesn't have too many risks when aligned checkpoints
> timeout
> >> is set to 30s or above. WDYT?
> >>
> >> Best,
> >> Rui
> >>
> >> On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen <
> zhanghao.c...@outlook.com>
> >> wrote:
> >>
> >> > Hi Piotr,
> >> >
> >> > As a platform administer who runs kilos of Flink jobs, I'd be against
> the
> >> > idea to enable unaligned cp by default for our jobs. It may help a
> >> > significant portion of the users, but the subtle issues around
> unaligned
> >> CP
> >> > for a few jobs will probably raise a lot more on-calls and incidents.
> >> From
> >> > my point of view, we'd better not enable it by default before removing
> >> all
> >> > the limitations listed in
> >> >
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations
> >> > .
> >> >
> >> > Best,
> >> > Zhanghao Chen
> >> > ________________________________
> >> > From: Piotr Nowojski <pnowoj...@apache.org>
> >> > Sent: Friday, January 5, 2024 21:41
> >> > To: dev <dev@flink.apache.org>
> >> > Subject: FLIP-413: Enable unaligned checkpoints by default
> >> >
> >> > Hi!
> >> >
> >> > I would like to propose by default to enable unaligned checkpoints and
> >> also
> >> > simultaneously increase the aligned checkpoints timeout from 0ms to
> 5s. I
> >> > think this change is the right one to do for the majority of Flink
> users.
> >> >
> >> > For more rationale please take a look into the short FLIP-413 [1].
> >> >
> >> > What do you all think?
> >> >
> >> > Best,
> >> > Piotrek
> >> >
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default
> >> >
> >>
>

Reply via email to