Re: [DISCUSS] KIP-551: Expose disk read and write metrics

Colin McCabe Thu, 09 Jan 2020 16:59:15 -0800

On Thu, Jan 9, 2020, at 16:39, Jose Garcia Sancio wrote:
> Thanks Colin,
> 
> LGTM in general. The Linux documentation (
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/proc.txt?id=HEAD#n1644)
> defines these metrics as
> 
> read_bytes
> > ----------
> >
> > I/O counter: bytes read
> > Attempt to count the number of bytes which this process really did cause to
> > be fetched from the storage layer. Done at the submit_bio() level, so it is
> > accurate for block-backed filesystems. <please add status regarding NFS and
> > CIFS at a later time>
> >
> >
> > write_bytes
> > -----------
> >
> > I/O counter: bytes written
> > Attempt to count the number of bytes which this process caused to be sent
> > to
> > the storage layer. This is done at page-dirtying time.
> >
> 
> It looks like there is also another metric (cancelled_write_bytes) that
> affects the value reported in written_bytes. Do we want to take that into
> account when reporting the JMX metric
> kafka.server:type=KafkaServer,name=DiskWriteBytes?
> 
> cancelled_write_bytes
> > ---------------------
> >
> > The big inaccuracy here is truncate. If a process writes 1MB to a file and
> > then deletes the file, it will in fact perform no writeout. But it will
> > have
> > been accounted as having caused 1MB of write.
> > In other words: The number of bytes which this process caused to not
> > happen,
> > by truncating pagecache. A task can cause "negative" IO too. If this task
> > truncates some dirty pagecache, some IO which another task has been
> > accounted
> > for (in its write_bytes) will not be happening. We _could_ just subtract
> > that
> > from the truncating task's write_bytes, but there is information loss in
> > doing
> > that.


Hi Jose,

That's a good point, which I had overlooked!  I think we should just subtract 
out the cancelled_write_bytes, since it doesn't reflect actual I/O that was 
done.

I don't think the "cancelled" number typically gets that big, but there isn't 
really a reason to count bytes which we intended to write out but then never 
did.  I added a discussion of this to the KIP.

best,
Colin

> 
> 
> 
> On Mon, Jan 6, 2020 at 5:28 PM Colin McCabe <cmcc...@apache.org> wrote:
> 
> > On Tue, Dec 10, 2019, at 11:10, Magnus Edenhill wrote:
> > > Hi Colin,
> > >
> >
> > Hi Magnus,
> >
> > Thanks for taking a look.
> >
> > > aren't those counters (ever increasing), rather than gauges
> > (fluctuating)?
> >
> > Since this is in the Kafka broker, we're using Yammer.  This might be
> > confusing, but Yammer's concept of a "counter" is not actually monotonic.
> > It can decrease as well as increase.
> >
> > In general Yammer counters require you to call inc(amount) or dec(amount)
> > on them.  This doesn't match up with what we need to do here, which is to
> > (essentially) make a callback into the kernel by reading from /proc.
> >
> > The counter/gauge dichotomy doesn't affect the JMX, (I think?), so it's
> > really kind of an implementation detail.
> >
> > >
> > > You also mention CPU usage as a side note, you could use getrusage(2)'s
> > > ru_utime (user) and ru_stime (sys)
> > > to allow the broker to monitor its own CPU usage.
> > >
> >
> > Interesting idea.  It might be better to save that for a future KIP,
> > though, to avoid scope creep.
> >
> > best,
> > Colin
> >
> > > /Magnus
> > >
> > > Den tis 10 dec. 2019 kl 19:33 skrev Colin McCabe <cmcc...@apache.org>:
> > >
> > > > Hi all,
> > > >
> > > > I wrote KIP about adding support for exposing disk read and write
> > > > metrics.  Check it out here:
> > > >
> > > > https://cwiki.apache.org/confluence/x/sotSC
> > > >
> > > > best,
> > > > Colin
> > > >
> > >
> >
> 
> 
> -- 
> -Jose
>

Re: [DISCUSS] KIP-551: Expose disk read and write metrics

Reply via email to