Hello,
I have a strange situation:
On a host server we are running 5 VMs. The VMs have their disks provisioned by
cinder from a ceph cluster and are attached by quemu-kvm using librbd.
We have a very strange situation when the VMs apparently have stopped to work
for a few seconds (10-20), and after that were continuing their operations.
I only have access to the host system. Checking the values reported by sar I
can see the following:
A slight iowait appearing on the host (the problem has appeared between
12:29:35-12:29:55):
12:05:01 PM CPU %user %nice %system %iowait %steal %idle
12:15:01 PM all 3.16 0.00 0.55 0.00 0.00 96.29
12:25:01 PM all 3.34 0.00 0.73 0.00 0.00 95.93
12:35:01 PM all 3.65 0.00 0.94 1.44 0.00 93.97
<----- iowait is 1.44 the only value different than 0 for the whole day
12:45:01 PM all 3.27 0.00 0.65 0.00 0.00 96.08
12:55:01 PM all 3.18 0.00 0.58 0.00 0.00 96.24
The only disk based fs is the / :
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 63G 12K 63G 1% /dev
tmpfs 13G 1.4M 13G 1% /run
/dev/sda1 275G 13G 249G 5% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 63G 12K 63G 1% /run/shm
none 100M 0 100M 0% /run/user
while the sar values for the disk does not show anything unusual:
12:05:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz
await svctm %util
12:15:01 PM dev8-0 0.79 0.00 25.20 31.76 0.00
0.00 0.00 0.00
12:25:01 PM dev8-0 0.80 0.00 25.52 31.90 0.00
0.00 0.00 0.00
12:35:01 PM dev8-0 0.76 0.01 25.15 33.18 0.00
0.01 0.01 0.00
12:45:01 PM dev8-0 0.79 0.00 25.01 31.46 0.00
0.01 0.01 0.00
12:55:01 PM dev8-0 0.80 0.00 25.84 32.44 0.00
0.00 0.00 0.00
Average: dev8-0 0.79 0.00 25.34 32.14 0.00
0.00 0.00 0.00
The VMs have their discs on a ceph cluster and are accessing them using librbd.
I can see some traffic peak on the storage interface:
12:05:01 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s
txcmp/s rxmcst/s %ifutil
12:15:01 PM vlan6 157.49 148.81 42.45 328.90 0.00
0.00 0.00 0.00
12:25:01 PM vlan6 154.82 148.44 41.97 327.32 0.00
0.00 0.00 0.00
12:35:01 PM vlan6 157.22 154.34 47.12 505.42 0.00 0.00
0.00 0.00 <----- txkB goes up to 505 from an average of 328
12:45:01 PM vlan6 152.60 147.00 41.15 319.85 0.00
0.00 0.00 0.00
12:55:01 PM vlan6 156.09 147.38 42.22 323.50 0.00
0.00 0.00 0.00
Average: vlan6 155.64 149.19 42.98 361.00 0.00
0.00 0.00 0.00
My question is: is it possible that the librbd access to the ceph cluster has
caused iowait value observed on the host?
Thank you,
Laszlo
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com