stevenzwu commented on a change in pull request #3001:
URL: https://github.com/apache/iceberg/pull/3001#discussion_r696055317
##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -249,6 +251,21 @@ public Builder uidPrefix(String newPrefix) {
return this;
}
+ /**
+ * Set the {@link SlidingWindowReservoir} size (number of measurements
stored)
+ * for the two histogram metrics of data files and delete file size
distribution.
+ *
+ * @param newReservoirSize the new histogram reservoir size for the file
size distribution.
+ * default reservoir size is 128, which only add a small memory overhead
of 1 KB (128 x 8B) per histogram.
+ * For use cases with a lot of files, a larger reservoir size can produce
more accurate histogram distribution.
+ */
+ public Builder fileSizeHistogramReservoirSize(int newReservoirSize) {
Review comment:
if the flink job commits a lot of files per cycle, we would need to
increase the reservoir size to get more accurate distribution metrics. It is
related to the sink, because Flink sink will publish histogram metrics where
this config applies.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]