Hi Luke,

Thanks for the thoughtful feedback — I appreciate you taking a close look
at the KIP.

On the naming point, I agree that SizeInPercent can be ambiguous and might
be interpreted as disk utilization. Following your suggestion, I’ve updated
the KIP to use *RetentionSizeInPercent*, which more clearly communicates
that the metric represents a partition’s log size relative to its
configured retention limit rather than physical disk capacity.
Additionally, the metric is scoped using topic and partition tags. This
allows for strong correlation with other per-partition metrics and makes
the context of the value explicit.

Regarding a retention-time–based metric (for example, exposing the time
until the oldest segment expires), I considered this but am not convinced
it would be particularly actionable in practice. For topics older than the
configured retention time that have ongoing production, log segments tend
to remain near expiration continuously. This would cause such a metric to
hover close to zero in steady state. As a result, it may not provide
meaningful operational signal to topic owners, since data older than the
retention time is expected to be eligible for deletion.

For this reason, the current KIP focuses on size-based retention metrics,
which more directly indicate proximity to retention-triggered cleanup due
to storage pressure. I’m happy to revisit time-based metrics separately if
there are concrete use cases where they would add clear operational value.

Thanks again for the suggestions — please let me know if you see any other
areas that could benefit from clarification.


On Fri, Jan 16, 2026 at 3:49 PM Luke Chen <[email protected]> wrote:

> Hi Manan,
>
> Thanks for the KIP.
>
> 1. I agree this is a good improvement, but the naming is not clear IMO.
> "SizeInPercent" makes me think the disk is going to be full after it's
> 100%.
> Maybe "RetentionSizeInPercent"?
>
> 2. Do we need the similar metrics for the time retention?
> Like "RetentionTimeInSec", which is to show the time gap between oldest
> segment with the retention time, "RetentionTimeInSec = 300"  means the
> oldest segment will be expired after 300 seconds. Is that useful?
>
>
> Thank you,
> Luke
>
> On Mon, Jan 12, 2026 at 6:17 PM Manan Gupta <[email protected]> wrote:
>
> > Gentle reminder for feedback on the KIP-1257: Partition Size Percentage
> > Metrics for Storage Monitoring proposal.
> >
> > On Tue, Dec 16, 2025 at 5:34 PM Manan Gupta <[email protected]>
> wrote:
> >
> > > Hi all,
> > >
> > > This email starts the discussion thread for KIP-1257: Partition Size
> > > Percentage Metrics for Storage Monitoring. This KIP introduces
> > > retention-aware, percentage-based partition metrics that significantly
> > > improve Kafka’s storage observability. The proposed metrics simplify
> > > alerting, enhance capacity planning, and provide clear visibility into
> > > retention pressure—especially for tiered storage—while remaining
> > > lightweight, backward compatible, and operationally intuitive.
> > >
> > > I'd appreciate your initial thoughts and feedback on the proposal.
> > > https://cwiki.apache.org/confluence/x/MAEXG
> > >
> > >
> > > Thanks,
> > > Manan
> > >
> >
>

Reply via email to