Thanks, overall LGTM. AK1_5: I see paused-partitions-count and paused-partitions both representing int gauges. Can we unify into simply paused-partitions? *-Count can also point users towards windowed/cumulative count which isn’t the case here.
-Aditya On 2026/04/09 12:43:10 PoAn Yang wrote: > Hi Chia-Ping, Aditya, and Sahil, > > Thanks for your suggestion. > > chia_00, AK1, SD2: Align the naming is better. Change all metrics with > prefix `paused-partitions`. > > chia_01: After checking the code again, it's better to follow current > per-partition metrics like records-lag. > I change per-partition paused-partitions* metrics to use INFO level. > > AK2: Change paused-partitions-count to consumer-coordinator-metrics group. > > AK3: Add paused-partitions-rate/paused-partitions-total to both consumer > and per-partition levels. > Since existing per-partition metrics use INFO level, I change to use INFO > as well. > > AK4: Add a note about cardinality to consumer-fetch-manager-metrics > paragraph. > > SD1: Change to use -1 as default value > for paused-partitions-duration-seconds. > > SD3: Mention per-partition metrics are reset on partition reassignment in > Proposed Changes. > > Kind regards, > PoAn > > Sahil Devgon <[email protected]> 於 2026年4月7日週二 下午12:17寫道: > > > Hello PoAn, > > Thanks for the KIP, I have a few comments that we may consider adding to > > the KIP: > > 1. One thing I noticed is for partition-paused-time-ms, returning 0 when a > > partition is not paused could be slightly ambiguous since it's the same > > value a freshly paused partition would return. Would you consider returning > > -1 to indicate "not paused" (consistent with how partition-paused uses > > 0/1)? Or if 0 is preferred, a clear doc note would go a long way in > > preventing false positives in monitoring setups. > > > > 2. Adding to Chia-Ping and Aditya's naming suggestions, > > partition-paused-time-ms reads as "time in milliseconds" but semantically > > it measures elapsed duration since pause. A name like > > paused-partition-duration-ms/paused-partition-duration-seconds would better > > communicate intent and align with naming conventions used in other Kafka > > duration metrics (e.g., records-lag,fetch-latency-avg). > > > > 3. The test plan mentions verifying that the pause timestamp is "reset on > > partition reassignment" , it would be helpful to also describe this > > behavior explicitly in the Proposed Changes section, not just the test > > plan. For example, calling out that the pause state is cleared on > > reassignment regardless of prior pause status would make the spec feel > > complete. This is especially relevant for rebalance-heavy workloads where > > partitions move around frequently. > > > > Best, > > Sahil Devgon > > > > On Mon, Apr 6, 2026 at 4:34 PM PoAn Yang <[email protected]> wrote: > > > > > Hello everyone, > > > > > > I would like to start a discussion thread on KIP-1304. In this KIP, we > > > plan to add new consumer metrics about paused partitions. > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1304%3A+Add+consumer+metric+about+paused+partitions > > > > > > Please take a look and feel free to share any thoughts. > > > > > > Thanks, > > > PoAn > > >
