Re: [PR] [FLINK-39778][s3] Recoverable writer silently loses the in-flight tail on resume [flink]

via GitHub Fri, 05 Jun 2026 10:29:47 -0700


Samrat002 commented on PR #28268:
URL: https://github.com/apache/flink/pull/28268#issuecomment-4633923081


   > > Yes, you are right. That was initially discussed. what we are observing 
at the scale of production, users don't really set policies. There are billions 
of MPU get accumulated and, leading to high cost.
   > 
   > Could you please elaborate on how storing subparts in the state is linked 
to the billing problem. Aren't aborted MPUs introducing all the same dangling 
S3 objects?
   > 
   > My general question was more about why do we store subparts as separate 
tail files to resume from on S3. Are they as good as the inline Flink state in 
terms of data corruption risks?
   
   My bad, I misunderstood and correlated different things. 
   
   Two reasons I went with S3 objects over inlining in state:
     1. Checkpoint cost. Tails can be up to part-size 5 MiB+, often larger. 
Inlining per writer per checkpoint inflates checkpoint payload through the 
JM/state backend. At scale that's a real cost vs. a single S3 PUT.
     2. Durability is the same. The state backend is usually S3 too, so a tail 
object gives us the same 11-9s either way. State doesn't strengthen the 
guarantee, just shifts where the bytes live.
   
   On lifecycle, tail objects live under deterministic keys we own, and 
deletion is driven by our commit/abort/recovery path not by a bucket lifecycle 
policy. So cleanup is as predictable as state GC, without the checkpoint-size 
hit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-39778][s3] Recoverable writer silently loses the in-flight tail on resume [flink]

Reply via email to