Hi All,

I'd like to open a discussion on FLIP-576: Filesystem-Plugin Observability
for (flink-s3-fs-native)[1].

Apache Flink’s filesystem layer is critical to core operations like
checkpoints, savepoints, and state access. Most of which rely heavily on
S3. Despite this, the current observability in s3<>flink is offering little
insight into underlying issues. Engineers lack visibility into key failure
signals, including S3 throttling, retry behaviour, slow operations, load
distribution, multipart upload leaks, and intermittent stream failures. As
a result, diagnosing production issues often requires manual correlation
across logs and external systems, making troubleshooting slow and
unreliable. This observability gap significantly impacts the operability of
Flink in real-world large-scale deployments.
This FLIP proposal addresses the same and builds support for native S3 FS.

Looking forward to your feedback.

Bests,
Samrat

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957173

Reply via email to