You should never run a production cluster with this configuration. Have you tried to access the disk with ceph-objectstoretool? The goal would be export the shard of the PG on that disk and import it into any other OSD.
Paul Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Mar 13, 2019 at 7:08 PM Benjamin.Zieglmeier <benjamin.zieglme...@target.com> wrote: > > After restarting several OSD daemons in our ceph cluster a couple days ago, a > couple of our OSDs won’t come online. The services start and crash with the > below error. We have one pg marked as incomplete, and will not peer. The pool > is erasure coded, 2+1, currently set to size=3, min_size=2. The incomplete pg > states it is not peering due to: > > > > "comment": "not enough complete instances of this PG" and: > > "down_osds_we_would_probe": [ > > 7, > > 16 > > ], > > 7 is completely lost, drive dead, 16 will not come online (refer to log > output below). > > > > We’ve tried searching user-list and tweaking osd conf settings for several > days, to no avail. Reaching out here as a last ditch effort before we have to > give up on the pg. > > > > tcmalloc: large alloc 1073741824 bytes == 0x560ada35c000 @ 0x7f5c1081e4ef > 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 0x7f5c0e9469df > 0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 > 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 > 0x560a8f514373 > > tcmalloc: large alloc 2147483648 bytes == 0x560b1a35c000 @ 0x7f5c1081e4ef > 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 0x7f5c0e9469df > 0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 > 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 > 0x560a8f514373 > > tcmalloc: large alloc 4294967296 bytes == 0x560b9a35c000 @ 0x7f5c1081e4ef > 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 0x7f5c0e9469df > 0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 > 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 > 0x560a8f514373 > > tcmalloc: large alloc 3840745472 bytes == 0x560a9a334000 @ 0x7f5c1081e4ef > 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e945c76 0x7f5c0e94623e 0x560a8fdea280 > 0x560a8fda8f36 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 0x560a8f9f8f88 > 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 0x560a8f514373 > > tcmalloc: large alloc 2728992768 bytes == 0x560e779ee000 @ 0x7f5c1081e4ef > 0x7f5c1083f010 0x560a8faa5674 0x560a8faa7125 0x560a8fa835a7 0x560a8fa5aa3c > 0x560a8fa5c238 0x560a8fa77dcc 0x560a8fe439ef 0x560a8fe43c03 0x560a8fe5acd4 > 0x560a8fda75ec 0x560a8fda9260 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 > 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 > 0x560a8f514373 > > /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In > function 'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time > 2019-03-13 12:46:39.632156 > > /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: > FAILED assert(0 == "unexpected aio error") > > 2019-03-13 12:46:39.632132 7f5c0a749700 -1 bdev(0x560a99c05000 > /var/lib/ceph/osd/ceph-16/block) aio to 4817558700032~2728988672 but > returned: 2147479552 > > ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous > (stable) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x110) [0x560a8fadd2a0] > > 2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24] > > 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d] > > 4: (()+0x7e25) [0x7f5c0efb0e25] > > 5: (clone()+0x6d) [0x7f5c0e0a1bad] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > 2019-03-13 12:46:39.633822 7f5c0a749700 -1 > /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In > function 'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time > 2019-03-13 12:46:39.632156 > > /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: > FAILED assert(0 == "unexpected aio error") > > > > ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous > (stable) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x110) [0x560a8fadd2a0] > > 2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24] > > 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d] > > 4: (()+0x7e25) [0x7f5c0efb0e25] > > 5: (clone()+0x6d) [0x7f5c0e0a1bad] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > > -1> 2019-03-13 12:46:39.632132 7f5c0a749700 -1 bdev(0x560a99c05000 > /var/lib/ceph/osd/ceph-16/block) aio to 4817558700032~2728988672 but > returned: 2147479552 > > 0> 2019-03-13 12:46:39.633822 7f5c0a749700 -1 > /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In > function 'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time > 2019-03-13 12:46:39.632156 > > /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: > FAILED assert(0 == "unexpected aio error") > > > > ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous > (stable) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x110) [0x560a8fadd2a0] > > 2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24] > > 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d] > > 4: (()+0x7e25) [0x7f5c0efb0e25] > > 5: (clone()+0x6d) [0x7f5c0e0a1bad] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > > *** Caught signal (Aborted) ** > > in thread 7f5c0a749700 thread_name:bstore_aio > > ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous > (stable) > > 1: (()+0xa41911) [0x560a8fa9e911] > > 2: (()+0xf6d0) [0x7f5c0efb86d0] > > 3: (gsignal()+0x37) [0x7f5c0dfd9277] > > 4: (abort()+0x148) [0x7f5c0dfda968] > > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x284) [0x560a8fadd414] > > 6: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24] > > 7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d] > > 8: (()+0x7e25) [0x7f5c0efb0e25] > > 9: (clone()+0x6d) [0x7f5c0e0a1bad] > > 2019-03-13 12:46:39.635955 7f5c0a749700 -1 *** Caught signal (Aborted) ** > > in thread 7f5c0a749700 thread_name:bstore_aio > > > > ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous > (stable) > > 1: (()+0xa41911) [0x560a8fa9e911] > > 2: (()+0xf6d0) [0x7f5c0efb86d0] > > 3: (gsignal()+0x37) [0x7f5c0dfd9277] > > 4: (abort()+0x148) [0x7f5c0dfda968] > > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x284) [0x560a8fadd414] > > 6: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24] > > 7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d] > > 8: (()+0x7e25) [0x7f5c0efb0e25] > > 9: (clone()+0x6d) [0x7f5c0e0a1bad] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > > 0> 2019-03-13 12:46:39.635955 7f5c0a749700 -1 *** Caught signal > (Aborted) ** > > in thread 7f5c0a749700 thread_name:bstore_aio > > > > ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous > (stable) > > 1: (()+0xa41911) [0x560a8fa9e911] > > 2: (()+0xf6d0) [0x7f5c0efb86d0] > > 3: (gsignal()+0x37) [0x7f5c0dfd9277] > > 4: (abort()+0x148) [0x7f5c0dfda968] > > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x284) [0x560a8fadd414] > > 6: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24] > > 7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d] > > 8: (()+0x7e25) [0x7f5c0efb0e25] > > 9: (clone()+0x6d) [0x7f5c0e0a1bad] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > > Aborted > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com