Thanks for the detailed proposal — this looks useful for improving I/O observability.
Regarding the metric type: the proposal describes these metrics as *monotonically increasing counters*, but they are exposed as *Gauge* metrics. Since /proc/self/io values only increase during the process lifetime and reset on restart, would it make more sense to expose them as counters instead? This might make rate calculations more straightforward for monitoring systems that expect cumulative metrics for deriving rates. Since /proc/self/io provides *process-level metrics*, these values will aggregate I/O across all Kafka log directories.In deployments where brokers use *multiple log.dirs across different disks*, operators often rely on per-disk observability to diagnose hotspots or imbalanced usage.Do you see any risk that these metrics could be misinterpreted as disk-level signals, or should the documentation explicitly clarify that they reflect *aggregate broker process I/O* rather than per-disk activity? Thanks, Manan On Mon, Mar 9, 2026 at 9:37 PM Sahil Devgon <[email protected]> wrote: > Hi Team, just checking if you were able to review the KIP and have some > comments or suggestions on this thread. > https://cwiki.apache.org/confluence/x/co48G > > If there are no comments , I intend to start a Vote thread for the same in > the coming days. > > Thanks, > Sahil Devgon > > On Wed, Feb 25, 2026 at 8:20 PM Sahil Devgon <[email protected]> > wrote: > > > Hi, > > > > I would like to start a discussion thread on KIP-1291. In this KIP, we > aim > > to expose all 7 Linux I/O metrics from /proc/self/io instead of just the > > current 2 (read_bytes and write_bytes). > > The 5 additional metrics (rchar, wchar, syscr, syscw, > > cancelled_write_bytes) enable operators to diagnose cache effectiveness, > > write amplification, and I/O pattern inefficiencies. > > > > https://cwiki.apache.org/confluence/x/co48G > > > > Please review the KIP and feel free to share your thoughts. > > > > Thanks, > > Sahil Devgon > > > > >
