Hi Luke, Thanks for the thoughtful feedback — I appreciate you taking a close look at the KIP.
On the naming point, I agree that SizeInPercent can be ambiguous and might be interpreted as disk utilization. Following your suggestion, I’ve updated the KIP to use *RetentionSizeInPercent*, which more clearly communicates that the metric represents a partition’s log size relative to its configured retention limit rather than physical disk capacity. Additionally, the metric is scoped using topic and partition tags. This allows for strong correlation with other per-partition metrics and makes the context of the value explicit. Regarding a retention-time–based metric (for example, exposing the time until the oldest segment expires), I considered this but am not convinced it would be particularly actionable in practice. For topics older than the configured retention time that have ongoing production, log segments tend to remain near expiration continuously. This would cause such a metric to hover close to zero in steady state. As a result, it may not provide meaningful operational signal to topic owners, since data older than the retention time is expected to be eligible for deletion. For this reason, the current KIP focuses on size-based retention metrics, which more directly indicate proximity to retention-triggered cleanup due to storage pressure. I’m happy to revisit time-based metrics separately if there are concrete use cases where they would add clear operational value. Thanks again for the suggestions — please let me know if you see any other areas that could benefit from clarification. On Fri, Jan 16, 2026 at 3:49 PM Luke Chen <[email protected]> wrote: > Hi Manan, > > Thanks for the KIP. > > 1. I agree this is a good improvement, but the naming is not clear IMO. > "SizeInPercent" makes me think the disk is going to be full after it's > 100%. > Maybe "RetentionSizeInPercent"? > > 2. Do we need the similar metrics for the time retention? > Like "RetentionTimeInSec", which is to show the time gap between oldest > segment with the retention time, "RetentionTimeInSec = 300" means the > oldest segment will be expired after 300 seconds. Is that useful? > > > Thank you, > Luke > > On Mon, Jan 12, 2026 at 6:17 PM Manan Gupta <[email protected]> wrote: > > > Gentle reminder for feedback on the KIP-1257: Partition Size Percentage > > Metrics for Storage Monitoring proposal. > > > > On Tue, Dec 16, 2025 at 5:34 PM Manan Gupta <[email protected]> > wrote: > > > > > Hi all, > > > > > > This email starts the discussion thread for KIP-1257: Partition Size > > > Percentage Metrics for Storage Monitoring. This KIP introduces > > > retention-aware, percentage-based partition metrics that significantly > > > improve Kafka’s storage observability. The proposed metrics simplify > > > alerting, enhance capacity planning, and provide clear visibility into > > > retention pressure—especially for tiered storage—while remaining > > > lightweight, backward compatible, and operationally intuitive. > > > > > > I'd appreciate your initial thoughts and feedback on the proposal. > > > https://cwiki.apache.org/confluence/x/MAEXG > > > > > > > > > Thanks, > > > Manan > > > > > >
