Kernel 4.4 is ancient in terms of Ceph support; we've also encountered a lot of similar hangs with older kernels and cephfs.
Paul 2018-05-15 16:56 GMT+02:00 David C <[email protected]>: > I've seen similar behavior with cephfs client around that age, try 4.14+ > > On 15 May 2018 1:57 p.m., "Josef Zelenka" <[email protected]> > wrote: > > Client's kernel is 4.4.0. Regarding the hung osd request, i'll have to > check, the issue is gone now, so i'm not sure if i'll find what you are > suggesting. It's rather odd, because Ceph's failover worked for us every > time, so i'm trying to figure out whether it is a ceph or app issue. > > > > On 15/05/18 02:57, Yan, Zheng wrote: > > On Mon, May 14, 2018 at 5:37 PM, Josef Zelenka > > <[email protected]> wrote: > >> Hi everyone, we've encountered an unusual thing in our setup(4 nodes, 48 > >> OSDs, 3 monitors - ceph Jewel, Ubuntu 16.04 with kernel 4.4.0). > Yesterday, > >> we were doing a HW upgrade of the nodes, so they went down one by one - > the > >> cluster was in good shape during the upgrade, as we've done this > numerous > >> times and we're quite sure that the redundancy wasn't screwed up while > doing > >> this. However, during this upgrade one of the clients that does backups > to > >> cephfs(mounted via the kernel driver) failed to write the backup file > >> correctly to the cluster with the following trace after we turned off > one of > >> the nodes: > >> > >> [2585732.529412] ffff8800baa279a8 ffffffff813fb2df ffff880236230e00 > >> ffff8802339c0000 > >> [2585732.529414] ffff8800baa28000 ffff88023fc96e00 7fffffffffffffff > >> ffff8800baa27b20 > >> [2585732.529415] ffffffff81840ed0 ffff8800baa279c0 ffffffff818406d5 > >> 0000000000000000 > >> [2585732.529417] Call Trace: > >> [2585732.529505] [<ffffffff813fb2df>] ? cpumask_next_and+0x2f/0x40 > >> [2585732.529558] [<ffffffff81840ed0>] ? bit_wait+0x60/0x60 > >> [2585732.529560] [<ffffffff818406d5>] schedule+0x35/0x80 > >> [2585732.529562] [<ffffffff81843825>] schedule_timeout+0x1b5/0x270 > >> [2585732.529607] [<ffffffff810642be>] ? kvm_clock_get_cycles+0x1e/0x20 > >> [2585732.529609] [<ffffffff81840ed0>] ? bit_wait+0x60/0x60 > >> [2585732.529611] [<ffffffff8183fc04>] io_schedule_timeout+0xa4/0x110 > >> [2585732.529613] [<ffffffff81840eeb>] bit_wait_io+0x1b/0x70 > >> [2585732.529614] [<ffffffff81840c6e>] __wait_on_bit_lock+0x4e/0xb0 > >> [2585732.529652] [<ffffffff8118f3cb>] __lock_page+0xbb/0xe0 > >> [2585732.529674] [<ffffffff810c4460>] ? autoremove_wake_function+0x40/ > 0x40 > >> [2585732.529676] [<ffffffff8119078d>] pagecache_get_page+0x17d/0x1c0 > >> [2585732.529730] [<ffffffffc056b3a8>] ? ceph_pool_perm_check+0x48/ > 0x700 > >> [ceph] > >> [2585732.529732] [<ffffffff811907f6>] grab_cache_page_write_begin+ > 0x26/0x40 > >> [2585732.529738] [<ffffffffc056a6a8>] ceph_write_begin+0x48/0xe0 [ceph] > >> [2585732.529739] [<ffffffff8118fd6e>] generic_perform_write+0xce/0x1c0 > >> [2585732.529763] [<ffffffff8122bdb9>] ? file_update_time+0xc9/0x110 > >> [2585732.529769] [<ffffffffc05651c9>] ceph_write_iter+0xf89/0x1040 > [ceph] > >> [2585732.529792] [<ffffffff81199c19>] ? __alloc_pages_nodemask+0x159/ > 0x2a0 > >> [2585732.529808] [<ffffffff8120fedb>] new_sync_write+0x9b/0xe0 > >> [2585732.529811] [<ffffffff8120ff46>] __vfs_write+0x26/0x40 > >> [2585732.529812] [<ffffffff812108c9>] vfs_write+0xa9/0x1a0 > >> [2585732.529814] [<ffffffff81211585>] SyS_write+0x55/0xc0 > >> [2585732.529817] [<ffffffff818447f2>] entry_SYSCALL_64_fastpath+ > 0x16/0x71 > >> > >> > > is there any hang osd request in /sys/kernel/debug/ceph/xxxx/osdc? > > > >> I have encountered this behavior on Luminous, but not on Jewel. Anyone > who > >> has a clue why the write fails? As far as i'm concerned, it should > always > >> work if all the PGs are available. Thanks > >> Josef > >> > >> _______________________________________________ > >> ceph-users mailing list > >> [email protected] > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
