Hi All, I'd like to open a discussion on FLIP-576: Filesystem-Plugin Observability for (flink-s3-fs-native)[1].
Apache Flinkās filesystem layer is critical to core operations like checkpoints, savepoints, and state access. Most of which rely heavily on S3. Despite this, the current observability in s3<>flink is offering little insight into underlying issues. Engineers lack visibility into key failure signals, including S3 throttling, retry behaviour, slow operations, load distribution, multipart upload leaks, and intermittent stream failures. As a result, diagnosing production issues often requires manual correlation across logs and external systems, making troubleshooting slow and unreliable. This observability gap significantly impacts the operability of Flink in real-world large-scale deployments. This FLIP proposal addresses the same and builds support for native S3 FS. Looking forward to your feedback. Bests, Samrat [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957173
