On Mon, Aug 26, 2019 at 5:01 AM Wido den Hollander <w...@42on.com> wrote:
>
>
>
> On 8/22/19 5:49 PM, Jason Dillaman wrote:
> > On Thu, Aug 22, 2019 at 11:29 AM Wido den Hollander <w...@42on.com> wrote:
> >>
> >>
> >>
> >> On 8/22/19 3:59 PM, Jason Dillaman wrote:
> >>> On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander <w...@42on.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> In a couple of situations I have encountered that Virtual Machines
> >>>> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO)
> >>>> or sdX (Virtio-SCSI) devices while they were performing CPU intensive 
> >>>> tasks.
> >>>>
> >>>> These servers would be running a very CPU intensive application while
> >>>> *not* doing that many disk I/O.
> >>>>
> >>>> I however noticed that the I/O-wait of the disk(s) in the VM went up to
> >>>> 100%.
> >>>>
> >>>> This VM is CPU limited by Libvirt by putting that KVM process in it's
> >>>> own cgroup with a CPU limitation.
> >>>>
> >>>> Now, my theory is:
> >>>>
> >>>> KVM (qemu-kvm) is completely userspace and librbd runs inside qemu-kvm
> >>>> as a library. All threads for disk I/O are part of the same PID and thus
> >>>> part of that cgroup.
> >>>>
> >>>> If a process inside the Virtual Machine now starts to consume all CPU
> >>>> time there is nothing left for librbd which slows it down.
> >>>>
> >>>> This then causes a increased I/O-wait inside the Virtual Machine. Even
> >>>> though the VM is not performing a lot of disk I/O. The wait of the I/O
> >>>> goes up due to this.
> >>>>
> >>>>
> >>>> Is my theory sane?
> >>>
> >>> Yes, I would say that your theory is sane. Have you looked into
> >>> libvirt's cgroup controls for limiting the emulator portion vs the
> >>> vCPUs [1]? I'd hope the librbd code and threads should be running in
> >>> the emulator cgroup (in a perfect world).
> >>>
> >>
> >> I checked with 'virsh schedinfo X' and this is the output I got:
> >>
> >> Scheduler      : posix
> >> cpu_shares     : 1000
> >> vcpu_period    : 100000
> >> vcpu_quota     : -1
> >> emulator_period: 100000
> >> emulator_quota : -1
> >> global_period  : 100000
> >> global_quota   : -1
> >> iothread_period: 100000
> >> iothread_quota : -1
> >>
> >>
> >> How can we confirm if the librbd code runs inside the Emulator part?
> >
> > You can look under the "/proc/<QEMU PID>/tasks/<THREAD>/ directories.
> > The "comm" file has the thread friendly name. If it's a librbd /
> > librados thread you will see things like the following (taken from an
> > 'rbd bench-write' process):
> >
> > $ cat */comm
> > rbd
> > log
> > service
> > admin_socket
> > msgr-worker-0
> > msgr-worker-1
> > msgr-worker-2
> > rbd
> > ms_dispatch
> > ms_local
> > safe_timer
> > fn_anonymous
> > safe_timer
> > safe_timer
> > fn-radosclient
> > tp_librbd
> > safe_timer
> > safe_timer
> > taskfin_librbd
> > signal_handler
> >
> > Those directories also have "cgroup" files which will indicate which
> > cgroup the thread is currently living under. For example, the
> > "tp_librbd" thread is running under the following cgroups in my
> > environment:
> >
> > 11:blkio:/
> > 10:hugetlb:/
> > 9:freezer:/
> > 8:net_cls,net_prio:/
> > 7:memory:/user.slice/user-1000.slice/user@1000.service
> > 6:cpu,cpuacct:/
> > 5:devices:/user.slice
> > 4:perf_event:/
> > 3:cpuset:/
> > 2:pids:/user.slice/user-1000.slice/user@1000.service
> > 1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
> > 0::/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
> >
>
> I checked:
>
> root@n01:/proc/3668710/task# cat 3668748/comm
> tp_librbd
> root@n01:/proc/3668710/task#
>
> So that seems to be rbd right? I also checked the 'fn-radosclient' thread.
>
> root@n01:/proc/3668710/task# cat 3668748/cgroup
> 12:hugetlb:/
> 11:memory:/machine/i-1551-77-VM.libvirt-qemu
> 10:freezer:/machine/i-1551-77-VM.libvirt-qemu
> 9:pids:/system.slice/libvirt-bin.service
> 8:rdma:/
> 7:cpu,cpuacct:/machine/i-1551-77-VM.libvirt-qemu/emulator
> 6:blkio:/machine/i-1551-77-VM.libvirt-qemu
> 5:cpuset:/machine/i-1551-77-VM.libvirt-qemu/emulator
> 4:devices:/machine/i-1551-77-VM.libvirt-qemu
> 3:perf_event:/machine/i-1551-77-VM.libvirt-qemu
> 2:net_cls,net_prio:/machine/i-1551-77-VM.libvirt-qemu
> 1:name=systemd:/system.slice/libvirt-bin.service
> root@n01:/proc/3668710/task#
>
> It seems that this RBD thread is in the 'emulator', isn't it?
>
> Is this what we want?

Yup, that looks good to me. I would then double-check your cgroups to
see where the CPU restriction is being placed. If it's only at
"/machine/i-1551-77-VM.libvirt-qemu", then the emulator and vcpu
cgroups will be sharing time vs if each vcpu had its own restriction.

> Wido
>
> >
> >> Wido
> >>
> >>>> Can somebody confirm this?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Wido
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>> [1] https://libvirt.org/cgroups.html
> >>>
> >
> >
> >



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to