Thanks Fabian.

Do you know if the legacy StreamingFileSink has the same issues? Not asking
you to do research but just wondering if you happen to know.


On Tue, Nov 29, 2022 at 8:46 AM Fabian Paul <fp...@apache.org> wrote:

> Hi folks,
>
> I did some initial investigation, and the problem seems twofold.
>
> If no post-commit topology is used, we do not run into a problem where
> we could lose data but since we do not clean up the state correctly,
> we will hit this [1] when trying to stop the pipeline with a savepoint
> after we have started it from a savepoint.
> AFAICT all two-phase commit sinks are affected Kafka, File etc.
>
> For sinks using the post-commit topology, the same applies.
> Additionally, we might never do the commit from the post-commit
> topology resulting in lost data.
>
> Best,
> Fabian
>
> [1]
> https://github.com/apache/flink/blob/ed46cb2fd64f1cb306ae5b7654d2b4d64ab69f22/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/committables/CheckpointCommittableManagerImpl.java#L83
>

Reply via email to