Kernel 4.4 is ancient in terms of Ceph support; we've also encountered a
lot of similar hangs with older kernels and cephfs.


Paul

2018-05-15 16:56 GMT+02:00 David C <[email protected]>:

> I've seen similar behavior with cephfs client around that age, try 4.14+
>
> On 15 May 2018 1:57 p.m., "Josef Zelenka" <[email protected]>
> wrote:
>
> Client's kernel is 4.4.0. Regarding the hung osd request, i'll have to
> check, the issue is gone now, so i'm not sure if i'll find what you are
> suggesting. It's rather odd, because Ceph's failover worked for us every
> time, so i'm trying to figure out whether it is a ceph or app issue.
>
>
>
> On 15/05/18 02:57, Yan, Zheng wrote:
> > On Mon, May 14, 2018 at 5:37 PM, Josef Zelenka
> > <[email protected]> wrote:
> >> Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48
> >> OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0).
> Yesterday,
> >> we were doing a HW upgrade of the nodes, so they went down one by one -
> the
> >> cluster was in good shape during the upgrade, as we've done this
> numerous
> >> times and we're quite sure that the redundancy wasn't screwed up while
> doing
> >> this. However, during this upgrade one of the clients that does backups
> to
> >> cephfs(mounted via the kernel driver) failed to write the backup file
> >> correctly to the cluster with the following trace after we turned off
> one of
> >> the nodes:
> >>
> >> [2585732.529412]  ffff8800baa279a8 ffffffff813fb2df ffff880236230e00
> >> ffff8802339c0000
> >> [2585732.529414]  ffff8800baa28000 ffff88023fc96e00 7fffffffffffffff
> >> ffff8800baa27b20
> >> [2585732.529415]  ffffffff81840ed0 ffff8800baa279c0 ffffffff818406d5
> >> 0000000000000000
> >> [2585732.529417] Call Trace:
> >> [2585732.529505]  [<ffffffff813fb2df>] ? cpumask_next_and+0x2f/0x40
> >> [2585732.529558]  [<ffffffff81840ed0>] ? bit_wait+0x60/0x60
> >> [2585732.529560]  [<ffffffff818406d5>] schedule+0x35/0x80
> >> [2585732.529562]  [<ffffffff81843825>] schedule_timeout+0x1b5/0x270
> >> [2585732.529607]  [<ffffffff810642be>] ? kvm_clock_get_cycles+0x1e/0x20
> >> [2585732.529609]  [<ffffffff81840ed0>] ? bit_wait+0x60/0x60
> >> [2585732.529611]  [<ffffffff8183fc04>] io_schedule_timeout+0xa4/0x110
> >> [2585732.529613]  [<ffffffff81840eeb>] bit_wait_io+0x1b/0x70
> >> [2585732.529614]  [<ffffffff81840c6e>] __wait_on_bit_lock+0x4e/0xb0
> >> [2585732.529652]  [<ffffffff8118f3cb>] __lock_page+0xbb/0xe0
> >> [2585732.529674]  [<ffffffff810c4460>] ? autoremove_wake_function+0x40/
> 0x40
> >> [2585732.529676]  [<ffffffff8119078d>] pagecache_get_page+0x17d/0x1c0
> >> [2585732.529730]  [<ffffffffc056b3a8>] ? ceph_pool_perm_check+0x48/
> 0x700
> >> [ceph]
> >> [2585732.529732]  [<ffffffff811907f6>] grab_cache_page_write_begin+
> 0x26/0x40
> >> [2585732.529738]  [<ffffffffc056a6a8>] ceph_write_begin+0x48/0xe0 [ceph]
> >> [2585732.529739]  [<ffffffff8118fd6e>] generic_perform_write+0xce/0x1c0
> >> [2585732.529763]  [<ffffffff8122bdb9>] ? file_update_time+0xc9/0x110
> >> [2585732.529769]  [<ffffffffc05651c9>] ceph_write_iter+0xf89/0x1040
> [ceph]
> >> [2585732.529792]  [<ffffffff81199c19>] ? __alloc_pages_nodemask+0x159/
> 0x2a0
> >> [2585732.529808]  [<ffffffff8120fedb>] new_sync_write+0x9b/0xe0
> >> [2585732.529811]  [<ffffffff8120ff46>] __vfs_write+0x26/0x40
> >> [2585732.529812]  [<ffffffff812108c9>] vfs_write+0xa9/0x1a0
> >> [2585732.529814]  [<ffffffff81211585>] SyS_write+0x55/0xc0
> >> [2585732.529817]  [<ffffffff818447f2>] entry_SYSCALL_64_fastpath+
> 0x16/0x71
> >>
> >>
> > is there any hang osd request in /sys/kernel/debug/ceph/xxxx/osdc?
> >
> >> I have encountered this behavior on Luminous, but not on Jewel. Anyone
> who
> >> has a clue why the write fails? As far as i'm concerned, it should
> always
> >> work if all the PGs are available. Thanks
> >> Josef
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> [email protected]
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to