On Apr 5, 2018, at 19:44, Faaland, Olaf P. <faala...@llnl.gov> wrote:
> 
> Hi,
> 
> I have a couple of questions about these stats.  If these are documented 
> somewhere, by all means point me to them.  What I found in the operations 
> manual and on the web did not answer my questions.
> 
> What do
> 
> read_bytes                25673 samples [bytes] 1 3366225 145121869
> write_bytes               13641 samples [bytes] 1 3366225 468230469
> 
> mean in more detail?  I understand that the last three values are 
> MIN/MAX/SUM, and that their units are bytes, and that they reflect activity 
> since the file system was mounted or since the stats were last cleared.  But 
> more specifically:
> 
> samples:  Is this the number of requests issued to servers, e.g. RPC issued 
> with opcode OST_READ?  

No, these stats in the llite.*.stats file are "llite level" stats (i.e. they 
relate to the VFS operations).  If you want to get RPC-level stats you need to 
look at osc.*.stats.

> So if the user called read() 200 times on the same 1K file, which didn't ever 
> change and remained cached by the lustre client, and all the data was fetched 
> in a single RPC in the first place, then samples would be 1?  
> 
> And in that case, would the sum be 1K rather than 200K?

Simple testing shows that the read_bytes line has the number of read() syscalls 
and the total number of bytes read by the syscall (not the data read from the 
OST), even though both reads are from cache:

# lctl set_param llite.*.stats=clear
llite.testfs-ffff880007524000.stats=clear
# dd if=/dev/zero of=/mnt/testfs/ff bs=1M count=1
1048576 bytes (1.0 MB) copied, 0.00220207 s, 476 MB/s
# dd of=/dev/null if=/mnt/testfs/ff bs=1k count=1k
1048576 bytes (1.0 MB) copied, 0.00197065 s, 532 MB/s
# dd of=/dev/null if=/mnt/testfs/ff bs=1k count=1k
1048576 bytes (1.0 MB) copied, 0.00188529 s, 556 MB/s
# lctl get_param llite.*.stats
llite.testfs-ffff880007524000.stats=
snapshot_time             1523008010.817348638 secs.nsecs
read_bytes                2048 samples [bytes] 1024 1024 2097152
write_bytes               1 samples [bytes] 1048576 1048576 1048576
open                      3 samples [regs]
close                     3 samples [regs]
seek                      2 samples [regs]
truncate                  1 samples [regs]
getxattr                  1 samples [regs]
removexattr               1 samples [regs]
inode_permission          7 samples [regs]

Checking the OSC-level stats shows that there was a single write RPC of 1MB, 
and no read RPC at all, since the data remains in the client cache.

# lfs getstripe -i /mnt/testfs/ff
2
# lctl get_param osc.testfs-OST0002*.stats
osc.testfs-OST0002-osc-ffff880007524000.stats=
snapshot_time             1523008200.913698356 secs.nsecs
req_waittime              83 samples [usec] 119 2461 51353 41125171
req_active                83 samples [reqs] 1 1 83 83
ldlm_extent_enqueue       1 samples [reqs] 1 1 1 1
write_bytes               1 samples [bytes] 1048576 1048576 1048576 
1099511627776
ost_write                 1 samples [usec] 2461 2461 2461 6056521
ost_connect               1 samples [usec] 280 280 280 78400
ost_punch                 1 samples [usec] 291 291 291 84681
ost_statfs                1 samples [usec] 119 119 119 14161
obd_ping                  78 samples [usec] 164 1352 46717 29485783

Similarly, the ost.OSS.ost_io.stats file on the OSS will show the RPC stats 
handled by the whole server, while obdfilter.testfs-OST0002.stats will show the 
RPCs handled by this target, and osd-*.testfs-OST0002.brw_stats will show how 
the write was sent to disk (it will not show any read).  If a read is processed 
from the OSS read cache, it will appear at the ost_io and obdfilter level, but 
not at the osd-* level, since there was not actually any IO to disk.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to