Hi Team, I'd like to initiate a discussion on a feature that appears to be valuable for Flink + Iceberg users. (Related issue: #14662 <https://github.com/apache/iceberg/issues/14662>)
Currently, Iceberg's FlinkSink offers a set of data statistics <https://github.com/apache/iceberg/blob/main/flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/CommitSummary.java#L33-L38> in snapshot summary. However, there is no mechanism for application-level code to populate *custom/application-defined* statistics into Iceberg snapshot properties at commit time. An example use case: A Flink job computes the event-time boundaries for data ingested in each checkpoint (min/max event time in that batch) and aims to include this information in the snapshot summary, alongside the built-in statistics. The snapshot summary is a natural place for such metadata, since the statistics directly describe the data in that specific snapshot and belong with the snapshot itself. At the same time, the logic for computing these statistics is application-specific, making it difficult to handle entirely within the Iceberg framework. We've explored workarounds (such as static variables and external store, see this PR <https://github.com/apache/iceberg/pull/14594>) to pass these values to the committer, but these approaches are either not robust or add unnecessary complexity. Is there a recommended Flink-native approach for allowing applications to propagate custom, per-checkpoint metadata from Flink operators to the Iceberg committer to write to snapshot summary? If not, would the community be interested in supporting such a feature? Any guidance or pointers to related work would be appreciated. We’re also happy to contribute if this aligns with project goals. Thanks, Owen
