Hi Samrat,

I've checked it and good from my side.

BR,
G


On Thu, May 28, 2026 at 10:06 AM Aleksandr Iushmanov <[email protected]>
wrote:

> Thank you Samrat,
>
> Looks good to me!
>
> Kind regards,
> Alex
>
>
> On Wed, 27 May 2026 at 17:25, Samrat Deb <[email protected]> wrote:
>
> > Hi Aleksandr Iushmanov,
> >
> > > The proposal overall looks good to me, but I have a concern around the
> > > number of metrics we enable by default. As you have mentioned in the
> doc,
> > > the number of added time series is ~50. I have a feeling that enabling
> > them
> > > by default may lead to unpleasant surprises in terms of extra
> cardinality
> > > and the volume of exported data unless it is guarded through
> allowlists.
> > My
> > > personal preference would be to keep this option opt-in.
> >
> > Thank you for the suggestion. The opt-in makes sense. It would allow
> users
> > to decide the cardinality of metrics within their setup.
> > Here is my plan to add changes to the flip
> >
> >   s3.metrics.enabled: true
> >
> >   s3.metrics.allowlist:
> >      - api_call_count
> >
> >
> >      - api_call_duration_ms
> >
> >
> >      - throttle_count
> >
> >
> >      - retry_count
> >
> >
> >      - iops
> >
> >
> >      - mpu_aborted_total
> >  s3.metrics.detailed.enabled: false
> >
> >
> > Best,
> > Samrat
> >
> >
> >
> > On Fri, May 22, 2026 at 5:26 PM Gabor Somogyi <[email protected]
> >
> > wrote:
> >
> > > @Samrat
> > > Thanks for the detailed explanation for the metrics usage.
> > >
> > > Throttling is not supported by the actual implementation even though
> > > we plan to add metrics for it. It's good to go however, I'm about to
> add
> > > throttling support soon.
> > >
> > > ------------
> > >
> > > One small API refinement worth considering: instead of adding a second
> > > "configure(Configuration, MetricGroup)"
> > > overload toFileSystemFactory, introduce a separate opt-in interface:
> > >
> > > public interface MetricsAware {
> > >     void setMetricGroup(MetricGroup metricGroup);
> > > }
> > >
> > > Then inside FileSystem.initialize():
> > > for (FileSystemFactory factory : factories) {
> > >     if (factory instanceof MetricsAware) {
> > >         ((MetricsAware) factory).setMetricGroup(metricGroup);
> > >     }
> > > }
> > >
> > > This keeps FileSystemFactory's contract unchanged, third-party
> > > implementations need zero
> > > modifications unless they want metrics. The FLIP's default-on
> collection
> > is
> > > fine; this is purely an interface hygiene suggestion.
> > >
> > > @Aleksandr
> > > If opt-in means "s3.metrics.enabled" defaults to "false", I'd say
> that's
> > > not the way to go.
> > > Observability features that require pre-incident configuration tend to
> > > never get enabled,
> > > which directly defeats the FLIP's stated goal of closing the
> operational
> > > blindness gap.
> > >
> > > The concern about cardinality is legitimate, but the math is favorable:
> > > these ~50 series are at
> > > TM scope, not subtask scope. A 100-TM cluster adds roughly 5,000 series
> > > which is modest
> > > compared to what operator-level metrics already emit.
> > >
> > > The right answer is informed default-on with a clear escape hatch. The
> > FLIP
> > > already has
> > > the split between basic (default-on, bounded cardinality) and detailed
> > > (opt-in via "s3.metrics.detailed.enabled").
> > > Teams with strict cardinality budgets can also suppress the entire
> group
> > at
> > > the reporter level with a single line:
> > > metrics.reporter.<name>.filter.excludes = *.filesystem.*:*:*
> > >
> > > During performance testing we're intended to measure things in-depth
> and
> > if
> > > something
> > > blows up then fine tuning is still a possibilty during PR review.
> > >
> > > G
> > >
> > >
> > > On Thu, May 21, 2026 at 6:12 PM Aleksandr Iushmanov <
> [email protected]
> > >
> > > wrote:
> > >
> > > > Hi Samrat,
> > > >
> > > > Thank you for putting it together. I believe that this is a good
> > addition
> > > > to ensure that Flink is operation ready.
> > > >
> > > > The proposal overall looks good to me, but I have a concern around
> the
> > > > number of metrics we enable by default. As you have mentioned in the
> > doc,
> > > > the number of added time series is ~50. I have a feeling that
> enabling
> > > them
> > > > by default may lead to unpleasant surprises in terms of extra
> > cardinality
> > > > and the volume of exported data unless it is guarded through
> > allowlists.
> > > My
> > > > personal preference would be to keep this option opt-in.
> > > >
> > > > Please let me know your thoughts on this.
> > > >
> > > > Kind regards,
> > > > Alex
> > > >
> > > >
> > > > On Tue, 5 May 2026 at 10:58, Samrat Deb <[email protected]>
> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'd like to open a discussion on FLIP-576: Filesystem-Plugin
> > > > Observability
> > > > > for (flink-s3-fs-native)[1].
> > > > >
> > > > > Apache Flink’s filesystem layer is critical to core operations like
> > > > > checkpoints, savepoints, and state access. Most of which rely
> heavily
> > > on
> > > > > S3. Despite this, the current observability in s3<>flink is
> offering
> > > > little
> > > > > insight into underlying issues. Engineers lack visibility into key
> > > > failure
> > > > > signals, including S3 throttling, retry behaviour, slow operations,
> > > load
> > > > > distribution, multipart upload leaks, and intermittent stream
> > failures.
> > > > As
> > > > > a result, diagnosing production issues often requires manual
> > > correlation
> > > > > across logs and external systems, making troubleshooting slow and
> > > > > unreliable. This observability gap significantly impacts the
> > > operability
> > > > of
> > > > > Flink in real-world large-scale deployments.
> > > > > This FLIP proposal addresses the same and builds support for native
> > S3
> > > > FS.
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > > Bests,
> > > > > Samrat
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957173
> > > > >
> > > >
> > >
> >
>

Reply via email to