Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

Jason Dillaman Fri, 23 Jun 2017 05:01:13 -0700

Yes, I"d say they aren't related. Since you can repeat this issue
after a fresh VM boot, can you enable debug-level logging for said VM
(add "debug rbd = 20" to your ceph.conf) and recreate the issue. Just
to confirm, this VM doesn't have any features enabled besides
(perhaps) layering?


On Fri, Jun 23, 2017 at 1:46 AM, Hall, Eric <[email protected]> wrote:
> The problem seems to be reliably reproducible after a fresh reboot of the VM…
>
> With this knowledge, I can cause the hung IO condition while having noscrub 
> and nodeepscrub set.
>
> Does this confirm this is not-related to http://tracker.ceph.com/issues/20041 
> ?
>
> --
> Eric
>
> On 6/22/17, 11:23 AM, "Hall, Eric" <[email protected]> wrote:
>
>     After some testing (doing heavy IO on a rdb-based VM with 
> hung_task_timeout_secs=1 while manually requesting deep-scrubs on the 
> underlying pgs (as determined via rados ls->osdmaptool), I don’t think 
> scrubbing is the cause.
>
>     At least, I can’t make it happen this way… although I can’t *always* make 
> it happen whileeither.  I will continue testing as above, but suggestions on 
> improved test methodology are welcome.
>
>
>     We occasionally see blocked requests in a running log (ceph –w > log), 
> but not correlated with hung VM IO.  Scrubbing doesn’t seem correlated either.
>
>     --
>     Eric
>
>     On 6/21/17, 2:55 PM, "Jason Dillaman" <[email protected]> wrote:
>
>         Do your VMs or OSDs show blocked requests? If you disable scrub or
>         restart the blocked OSD, does the issue go away? If yes, it most
>         likely is this issue [1].
>
>         [1] http://tracker.ceph.com/issues/20041
>
>         On Wed, Jun 21, 2017 at 3:33 PM, Hall, Eric 
> <[email protected]> wrote:
>         > The VMs are using stock Ubuntu14/16 images so yes, there is the 
> default “/sbin/fstrim –all” in /etc/cron.weekly/fstrim.
>         >
>         > --
>         > Eric
>         >
>         > On 6/21/17, 1:58 PM, "Jason Dillaman" <[email protected]> wrote:
>         >
>         >     Are some or many of your VMs issuing periodic fstrims to discard
>         >     unused extents?
>         >
>         >     On Wed, Jun 21, 2017 at 2:36 PM, Hall, Eric 
> <[email protected]> wrote:
>         >     > After following/changing all suggested items (turning off 
> exclusive-lock
>         >     > (and associated object-map and fast-diff), changing host 
> cache behavior,
>         >     > etc.) this is still a blocking issue for many uses of our 
> OpenStack/Ceph
>         >     > installation.
>         >     >
>         >     >
>         >     >
>         >     > We have upgraded Ceph to 10.2.7, are running 4.4.0-62 or 
> later kernels on
>         >     > all storage, compute hosts, and VMs, with libvirt 1.3.1 on 
> compute hosts.
>         >     > Have also learned quite a bit about producing debug logs. ;)
>         >     >
>         >     >
>         >     >
>         >     > I’ve followed the related threads since March with bated 
> breath, but still
>         >     > find no resolution.
>         >     >
>         >     >
>         >     >
>         >     > Previously, I got pulled away before I could produce/report 
> discussed debug
>         >     > info, but am back on the case now. Please let me know how I 
> can help
>         >     > diagnose and resolve this problem.
>         >     >
>         >     >
>         >     >
>         >     > Any assistance appreciated,
>         >     >
>         >     > --
>         >     >
>         >     > Eric
>         >     >
>         >     >
>         >     >
>         >     > On 3/28/17, 3:05 AM, "Marius Vaitiekunas" 
> <[email protected]>
>         >     > wrote:
>         >     >
>         >     >
>         >     >
>         >     >
>         >     >
>         >     >
>         >     >
>         >     > On Mon, Mar 27, 2017 at 11:17 PM, Peter Maloney
>         >     > <[email protected]> wrote:
>         >     >
>         >     > I can't guarantee it's the same as my issue, but from that it 
> sounds the
>         >     > same.
>         >     >
>         >     > Jewel 10.2.4, 10.2.5 tested
>         >     > hypervisors are proxmox qemu-kvm, using librbd
>         >     > 3 ceph nodes with mon+osd on each
>         >     >
>         >     > -faster journals, more disks, bcache, rbd_cache, fewer VMs on 
> ceph, iops
>         >     > and bw limits on client side, jumbo frames, etc. all 
> improve/smooth out
>         >     > performance and mitigate the hangs, but don't prevent it.
>         >     > -hangs are usually associated with blocked requests (I set 
> the complaint
>         >     > time to 5s to see them)
>         >     > -hangs are very easily caused by rbd snapshot + rbd 
> export-diff to do
>         >     > incremental backup (one snap persistent, plus one more during 
> backup)
>         >     > -when qemu VM io hangs, I have to kill -9 the qemu process 
> for it to
>         >     > stop. Some broken VMs don't appear to be hung until I try to 
> live
>         >     > migrate them (live migrating all VMs helped test solutions)
>         >     >
>         >     > Finally I have a workaround... disable exclusive-lock, 
> object-map, and
>         >     > fast-diff rbd features (and restart clients via live migrate).
>         >     > (object-map and fast-diff appear to have no effect on dif or 
> export-diff
>         >     > ... so I don't miss them). I'll file a bug at some point 
> (after I move
>         >     > all VMs back and see if it is still stable). And one other 
> user on IRC
>         >     > said this solved the same problem (also using rbd snapshots).
>         >     >
>         >     > And strangely, they don't seem to hang if I put back those 
> features,
>         >     > until a few days later (making testing much less easy...but 
> now I'm very
>         >     > sure removing them prevents the issue)
>         >     >
>         >     > I hope this works for you (and maybe gets some attention from 
> devs too),
>         >     > so you don't waste months like me.
>         >     >
>         >     >
>         >     > On 03/27/17 19:31, Hall, Eric wrote:
>         >     >> In an OpenStack (mitaka) cloud, backed by a ceph cluster 
> (10.2.6 jewel),
>         >     >> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 
> compute and
>         >     >> ceph hosts, we occasionally see hung processes (usually 
> during boot, but
>         >     >> otherwise as well), with errors reported in the instance 
> logs as shown
>         >     >> below.  Configuration is vanilla, based on openstack/ceph 
> docs.
>         >     >>
>         >     >> Neither the compute hosts nor the ceph hosts appear to be 
> overloaded in
>         >     >> terms of memory or network bandwidth, none of the 67 osds 
> are over 80% full,
>         >     >> nor do any of them appear to be overwhelmed in terms of IO.  
> Compute hosts
>         >     >> and ceph cluster are connected via a relatively quiet 1Gb 
> network, with an
>         >     >> IBoE net between the ceph nodes.  Neither network appears 
> overloaded.
>         >     >>
>         >     >> I don’t see any related (to my eye) errors in client or 
> server logs, even
>         >     >> with 20/20 logging from various components (rbd, rados, 
> client,
>         >     >> objectcacher, etc.)  I’ve increased the qemu file descriptor 
> limit
>         >     >> (currently 64k... overkill for sure.)
>         >     >>
>         >     >> I “feels” like a performance problem, but I can’t find any 
> capacity issues
>         >     >> or constraining bottlenecks.
>         >     >>
>         >     >> Any suggestions or insights into this situation are 
> appreciated.  Thank
>         >     >> you for your time,
>         >     >> --
>         >     >> Eric
>         >     >>
>         >     >>
>         >     >> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 
> blocked for more
>         >     >> than 120 seconds.
>         >     >> [Fri Mar 24 20:30:40 2017]       Not tainted 
> 3.13.0-52-generic #85-Ubuntu
>         >     >> [Fri Mar 24 20:30:40 2017] "echo 0 >
>         >     >> /proc/sys/kernel/hung_task_timeout_secs" disables this 
> message.
>         >     >> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8     D 
> ffff88043fd13180     0   226
>         >     >> 2 0x00000000
>         >     >> [Fri Mar 24 20:30:40 2017]  ffff88003728bbd8 0000000000000046
>         >     >> ffff880426900000 ffff88003728bfd8
>         >     >> [Fri Mar 24 20:30:40 2017]  0000000000013180 0000000000013180
>         >     >> ffff880426900000 ffff88043fd13a18
>         >     >> [Fri Mar 24 20:30:40 2017]  ffff88043ffb9478 0000000000000002
>         >     >> ffffffff811ef7c0 ffff88003728bc50
>         >     >> [Fri Mar 24 20:30:40 2017] Call Trace:
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7c0>] ?
>         >     >> generic_block_bmap+0x50/0x50
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff81726d2d>] 
> io_schedule+0x9d/0x140
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7ce>] 
> sleep_on_buffer+0xe/0x20
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff817271b2>] 
> __wait_on_bit+0x62/0x90
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7c0>] ?
>         >     >> generic_block_bmap+0x50/0x50
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff81727257>]
>         >     >> out_of_line_wait_on_bit+0x77/0x90
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff810ab180>] ?
>         >     >> autoremove_wake_function+0x40/0x40
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff811f0afa>]
>         >     >> __wait_on_buffer+0x2a/0x30
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128bb4d>]
>         >     >> jbd2_journal_commit_transaction+0x185d/0x1ab0
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff810755df>] ?
>         >     >> try_to_del_timer_sync+0x4f/0x70
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128fe7d>] 
> kjournald2+0xbd/0x250
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff810ab140>] ?
>         >     >> prepare_to_wait_event+0x100/0x100
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128fdc0>] ?
>         >     >> commit_timeout+0x10/0x10
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b5d2>] 
> kthread+0xd2/0xf0
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b500>] ?
>         >     >> kthread_create_on_node+0x1c0/0x1c0
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8173304c>] 
> ret_from_fork+0x7c/0xb0
>         >     >> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b500>] ?
>         >     >> kthread_create_on_node+0x1c0/0x1c0
>         >     >>
>         >     >>
>         >     >>
>         >     >> _______________________________________________
>         >     >> ceph-users mailing list
>         >     >> [email protected]
>         >     >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>         >     >
>         >     >
>         >     > _______________________________________________
>         >     > ceph-users mailing list
>         >     > [email protected]
>         >     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>         >     >
>         >     >
>         >     >
>         >     > Hi,
>         >     >
>         >     >
>         >     >
>         >     > We are using these settings on hypervisors in openstack:
>         >     >
>         >     > vm.dirty_ratio = 40
>         >     >
>         >     > vm.dirty_background_ratio = 5
>         >     >
>         >     >
>         >     >
>         >     > And these on vms:
>         >     >
>         >     > vm.dirty_ratio = 10
>         >     >
>         >     > vm.dirty_background_ratio = 5
>         >     >
>         >     >
>         >     >
>         >     > In our case it prevents vms from crashing.
>         >     >
>         >     >
>         >     >
>         >     > --
>         >     >
>         >     > Marius Vaitiekūnas
>         >     >
>         >     >
>         >     > _______________________________________________
>         >     > ceph-users mailing list
>         >     > [email protected]
>         >     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>         >     >
>         >
>         >
>         >
>         >     --
>         >     Jason
>         >
>         >
>
>
>
>         --
>         Jason
>
>
>
>



-- 
Jason
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] red IO hang (was disk timeouts in libvirt/qemu VMs...)

Reply via email to