Hi Manan,

Thanks for the update.

About the time-based retention metrics, I agree with your thoughts.
But I think we should put it into the Rejected Alternatives section for
future reference.

Otherwise, LGTM!

Luke


On Mon, Feb 2, 2026 at 3:25 PM Manan Gupta <[email protected]> wrote:

> Hi Luke,
>
> Thanks for the thoughtful feedback — I appreciate you taking a close look
> at the KIP.
>
> On the naming point, I agree that SizeInPercent can be ambiguous and might
> be interpreted as disk utilization. Following your suggestion, I’ve updated
> the KIP to use *RetentionSizeInPercent*, which more clearly communicates
> that the metric represents a partition’s log size relative to its
> configured retention limit rather than physical disk capacity.
> Additionally, the metric is scoped using topic and partition tags. This
> allows for strong correlation with other per-partition metrics and makes
> the context of the value explicit.
>
> Regarding a retention-time–based metric (for example, exposing the time
> until the oldest segment expires), I considered this but am not convinced
> it would be particularly actionable in practice. For topics older than the
> configured retention time that have ongoing production, log segments tend
> to remain near expiration continuously. This would cause such a metric to
> hover close to zero in steady state. As a result, it may not provide
> meaningful operational signal to topic owners, since data older than the
> retention time is expected to be eligible for deletion.
>
> For this reason, the current KIP focuses on size-based retention metrics,
> which more directly indicate proximity to retention-triggered cleanup due
> to storage pressure. I’m happy to revisit time-based metrics separately if
> there are concrete use cases where they would add clear operational value.
>
> Thanks again for the suggestions — please let me know if you see any other
> areas that could benefit from clarification.
>
>
> On Fri, Jan 16, 2026 at 3:49 PM Luke Chen <[email protected]> wrote:
>
> > Hi Manan,
> >
> > Thanks for the KIP.
> >
> > 1. I agree this is a good improvement, but the naming is not clear IMO.
> > "SizeInPercent" makes me think the disk is going to be full after it's
> > 100%.
> > Maybe "RetentionSizeInPercent"?
> >
> > 2. Do we need the similar metrics for the time retention?
> > Like "RetentionTimeInSec", which is to show the time gap between oldest
> > segment with the retention time, "RetentionTimeInSec = 300"  means the
> > oldest segment will be expired after 300 seconds. Is that useful?
> >
> >
> > Thank you,
> > Luke
> >
> > On Mon, Jan 12, 2026 at 6:17 PM Manan Gupta <[email protected]>
> wrote:
> >
> > > Gentle reminder for feedback on the KIP-1257: Partition Size Percentage
> > > Metrics for Storage Monitoring proposal.
> > >
> > > On Tue, Dec 16, 2025 at 5:34 PM Manan Gupta <[email protected]>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > This email starts the discussion thread for KIP-1257: Partition Size
> > > > Percentage Metrics for Storage Monitoring. This KIP introduces
> > > > retention-aware, percentage-based partition metrics that
> significantly
> > > > improve Kafka’s storage observability. The proposed metrics simplify
> > > > alerting, enhance capacity planning, and provide clear visibility
> into
> > > > retention pressure—especially for tiered storage—while remaining
> > > > lightweight, backward compatible, and operationally intuitive.
> > > >
> > > > I'd appreciate your initial thoughts and feedback on the proposal.
> > > > https://cwiki.apache.org/confluence/x/MAEXG
> > > >
> > > >
> > > > Thanks,
> > > > Manan
> > > >
> > >
> >
>

Reply via email to