Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

Jason Dillaman Tue, 28 Mar 2017 07:54:15 -0700

Eric,

If you already have debug level 20 logs captured from one of these
events, I would love to be able to take a look at them to see what's
going on. Depending on the size, you could either attach the log to a
new RBD tracker ticket [1] or use the ceph-post-file helper to upload
a large file.


Thanks,
Jason

[1] http://tracker.ceph.com/projects/rbd/issues

On Mon, Mar 27, 2017 at 3:31 PM, Hall, Eric <eric.h...@vanderbilt.edu> wrote:
> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel), 
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and ceph 
> hosts, we occasionally see hung processes (usually during boot, but otherwise 
> as well), with errors reported in the instance logs as shown below.  
> Configuration is vanilla, based on openstack/ceph docs.
>
> Neither the compute hosts nor the ceph hosts appear to be overloaded in terms 
> of memory or network bandwidth, none of the 67 osds are over 80% full, nor do 
> any of them appear to be overwhelmed in terms of IO.  Compute hosts and ceph 
> cluster are connected via a relatively quiet 1Gb network, with an IBoE net 
> between the ceph nodes.  Neither network appears overloaded.
>
> I don’t see any related (to my eye) errors in client or server logs, even 
> with 20/20 logging from various components (rbd, rados, client, objectcacher, 
> etc.)  I’ve increased the qemu file descriptor limit (currently 64k... 
> overkill for sure.)
>
> I “feels” like a performance problem, but I can’t find any capacity issues or 
> constraining bottlenecks.
>
> Any suggestions or insights into this situation are appreciated.  Thank you 
> for your time,
> --
> Eric
>
>
> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more than 
> 120 seconds.
> [Fri Mar 24 20:30:40 2017]       Not tainted 3.13.0-52-generic #85-Ubuntu
> [Fri Mar 24 20:30:40 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8     D ffff88043fd13180     0   226     
>  2 0x00000000
> [Fri Mar 24 20:30:40 2017]  ffff88003728bbd8 0000000000000046 
> ffff880426900000 ffff88003728bfd8
> [Fri Mar 24 20:30:40 2017]  0000000000013180 0000000000013180 
> ffff880426900000 ffff88043fd13a18
> [Fri Mar 24 20:30:40 2017]  ffff88043ffb9478 0000000000000002 
> ffffffff811ef7c0 ffff88003728bc50
> [Fri Mar 24 20:30:40 2017] Call Trace:
> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7c0>] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [<ffffffff81726d2d>] io_schedule+0x9d/0x140
> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7ce>] sleep_on_buffer+0xe/0x20
> [Fri Mar 24 20:30:40 2017]  [<ffffffff817271b2>] __wait_on_bit+0x62/0x90
> [Fri Mar 24 20:30:40 2017]  [<ffffffff811ef7c0>] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [<ffffffff81727257>] 
> out_of_line_wait_on_bit+0x77/0x90
> [Fri Mar 24 20:30:40 2017]  [<ffffffff810ab180>] ? 
> autoremove_wake_function+0x40/0x40
> [Fri Mar 24 20:30:40 2017]  [<ffffffff811f0afa>] __wait_on_buffer+0x2a/0x30
> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128bb4d>] 
> jbd2_journal_commit_transaction+0x185d/0x1ab0
> [Fri Mar 24 20:30:40 2017]  [<ffffffff810755df>] ? 
> try_to_del_timer_sync+0x4f/0x70
> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128fe7d>] kjournald2+0xbd/0x250
> [Fri Mar 24 20:30:40 2017]  [<ffffffff810ab140>] ? 
> prepare_to_wait_event+0x100/0x100
> [Fri Mar 24 20:30:40 2017]  [<ffffffff8128fdc0>] ? commit_timeout+0x10/0x10
> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b5d2>] kthread+0xd2/0xf0
> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b500>] ? 
> kthread_create_on_node+0x1c0/0x1c0
> [Fri Mar 24 20:30:40 2017]  [<ffffffff8173304c>] ret_from_fork+0x7c/0xb0
> [Fri Mar 24 20:30:40 2017]  [<ffffffff8108b500>] ? 
> kthread_create_on_node+0x1c0/0x1c0
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

Reply via email to