ableegoldman commented on a change in pull request #8697: URL: https://github.com/apache/kafka/pull/8697#discussion_r429505029
########## File path: streams/src/main/java/org/apache/kafka/streams/processor/internals/metrics/StreamsMetricsImpl.java ########## @@ -149,6 +154,10 @@ public int hashCode() { public static final String RATE_DESCRIPTION_PREFIX = "The average number of "; public static final String RATE_DESCRIPTION_SUFFIX = " per second"; + public static final int PERCENTILES_SIZE_IN_BYTES = 1000 * 1000; // 1 MB + public static long MAXIMUM_E2E_LATENCY = 10 * 24 * 60 * 60 * 1000L; // maximum latency is 10 days Review comment: 10 days was just rounding up from the 7 day default retention limit. The maximum is due to the percentiles calculation which is based on incrementally sized buckets. It's a tradeoff with accuracy For example if I increase it by a factor of 1000, the `StreamTask` percentiles test is off by almost 20% (p99 is 82.9 instead of 99). This test uses values between 0 and 100, which is probably considerably lower than most e2e latencies will be.If you look at the `MetricsTest` percentiles test I added, this uses random values up to the max value and can maintain the 10% accuracy up to a higher max value. Of course we don't know what the distribution will be, but it seems likely to be somewhere in the middle (not in the 100s of ms, not in the 10s or 1000s of days) so for reasonable accuracy we need to pick a reasonable maximum. We can definitely go higher than 10 days, but I reasoned that if you have records earlier than 10 days you're probably processing historical data and in that case the e2e latency isn't that meaningful. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org