You should never run a production cluster with this configuration.

Have you tried to access the disk with ceph-objectstoretool? The goal
would be export the shard of the PG on that disk and import it into
any other OSD.


Paul




Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Mar 13, 2019 at 7:08 PM Benjamin.Zieglmeier
<benjamin.zieglme...@target.com> wrote:
>
> After restarting several OSD daemons in our ceph cluster a couple days ago, a 
> couple of our OSDs won’t come online. The services start and crash with the 
> below error. We have one pg marked as incomplete, and will not peer. The pool 
> is erasure coded, 2+1, currently set to size=3, min_size=2. The incomplete pg 
> states it is not peering due to:
>
>
>
> "comment": "not enough complete instances of this PG" and:
>
>            "down_osds_we_would_probe": [
>
>                 7,
>
>                 16
>
>             ],
>
> 7 is completely lost, drive dead, 16 will not come online (refer to log 
> output below).
>
>
>
> We’ve tried searching user-list and tweaking osd conf settings for several 
> days, to no avail. Reaching out here as a last ditch effort before we have to 
> give up on the pg.
>
>
>
> tcmalloc: large alloc 1073741824 bytes == 0x560ada35c000 @  0x7f5c1081e4ef 
> 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 0x7f5c0e9469df 
> 0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 
> 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 
> 0x560a8f514373
>
> tcmalloc: large alloc 2147483648 bytes == 0x560b1a35c000 @  0x7f5c1081e4ef 
> 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 0x7f5c0e9469df 
> 0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 
> 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 
> 0x560a8f514373
>
> tcmalloc: large alloc 4294967296 bytes == 0x560b9a35c000 @  0x7f5c1081e4ef 
> 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e9466cb 0x7f5c0e946774 0x7f5c0e9469df 
> 0x560a8fdb7db0 0x560a8fda8d28 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 
> 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 
> 0x560a8f514373
>
> tcmalloc: large alloc 3840745472 bytes == 0x560a9a334000 @  0x7f5c1081e4ef 
> 0x7f5c1083dbd6 0x7f5c0e945ab9 0x7f5c0e945c76 0x7f5c0e94623e 0x560a8fdea280 
> 0x560a8fda8f36 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 0x560a8f9f8f88 
> 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 0x560a8f514373
>
> tcmalloc: large alloc 2728992768 bytes == 0x560e779ee000 @  0x7f5c1081e4ef 
> 0x7f5c1083f010 0x560a8faa5674 0x560a8faa7125 0x560a8fa835a7 0x560a8fa5aa3c 
> 0x560a8fa5c238 0x560a8fa77dcc 0x560a8fe439ef 0x560a8fe43c03 0x560a8fe5acd4 
> 0x560a8fda75ec 0x560a8fda9260 0x560a8fdaa6b6 0x560a8fdab973 0x560a8fdacbb6 
> 0x560a8f9f8f88 0x560a8f983d83 0x560a8f9b5d7e 0x560a8f474069 0x7f5c0dfc5445 
> 0x560a8f514373
>
> /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In 
> function 'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time 
> 2019-03-13 12:46:39.632156
>
> /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: 
> FAILED assert(0 == "unexpected aio error")
>
> 2019-03-13 12:46:39.632132 7f5c0a749700 -1 bdev(0x560a99c05000 
> /var/lib/ceph/osd/ceph-16/block) aio to 4817558700032~2728988672 but 
> returned: 2147479552
>
>  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
> (stable)
>
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x110) [0x560a8fadd2a0]
>
>  2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]
>
>  3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]
>
>  4: (()+0x7e25) [0x7f5c0efb0e25]
>
>  5: (clone()+0x6d) [0x7f5c0e0a1bad]
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
> 2019-03-13 12:46:39.633822 7f5c0a749700 -1 
> /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In 
> function 'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time 
> 2019-03-13 12:46:39.632156
>
> /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: 
> FAILED assert(0 == "unexpected aio error")
>
>
>
>  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
> (stable)
>
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x110) [0x560a8fadd2a0]
>
>  2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]
>
>  3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]
>
>  4: (()+0x7e25) [0x7f5c0efb0e25]
>
>  5: (clone()+0x6d) [0x7f5c0e0a1bad]
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
>
>
>     -1> 2019-03-13 12:46:39.632132 7f5c0a749700 -1 bdev(0x560a99c05000 
> /var/lib/ceph/osd/ceph-16/block) aio to 4817558700032~2728988672 but 
> returned: 2147479552
>
>      0> 2019-03-13 12:46:39.633822 7f5c0a749700 -1 
> /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: In 
> function 'void KernelDevice::_aio_thread()' thread 7f5c0a749700 time 
> 2019-03-13 12:46:39.632156
>
> /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/KernelDevice.cc: 384: 
> FAILED assert(0 == "unexpected aio error")
>
>
>
>  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
> (stable)
>
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x110) [0x560a8fadd2a0]
>
>  2: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]
>
>  3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]
>
>  4: (()+0x7e25) [0x7f5c0efb0e25]
>
>  5: (clone()+0x6d) [0x7f5c0e0a1bad]
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
>
>
> *** Caught signal (Aborted) **
>
>  in thread 7f5c0a749700 thread_name:bstore_aio
>
>  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
> (stable)
>
>  1: (()+0xa41911) [0x560a8fa9e911]
>
>  2: (()+0xf6d0) [0x7f5c0efb86d0]
>
>  3: (gsignal()+0x37) [0x7f5c0dfd9277]
>
>  4: (abort()+0x148) [0x7f5c0dfda968]
>
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x284) [0x560a8fadd414]
>
>  6: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]
>
>  7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]
>
>  8: (()+0x7e25) [0x7f5c0efb0e25]
>
>  9: (clone()+0x6d) [0x7f5c0e0a1bad]
>
> 2019-03-13 12:46:39.635955 7f5c0a749700 -1 *** Caught signal (Aborted) **
>
>  in thread 7f5c0a749700 thread_name:bstore_aio
>
>
>
>  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
> (stable)
>
>  1: (()+0xa41911) [0x560a8fa9e911]
>
>  2: (()+0xf6d0) [0x7f5c0efb86d0]
>
>  3: (gsignal()+0x37) [0x7f5c0dfd9277]
>
>  4: (abort()+0x148) [0x7f5c0dfda968]
>
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x284) [0x560a8fadd414]
>
>  6: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]
>
>  7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]
>
>  8: (()+0x7e25) [0x7f5c0efb0e25]
>
>  9: (clone()+0x6d) [0x7f5c0e0a1bad]
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
>
>
>      0> 2019-03-13 12:46:39.635955 7f5c0a749700 -1 *** Caught signal 
> (Aborted) **
>
>  in thread 7f5c0a749700 thread_name:bstore_aio
>
>
>
>  ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
> (stable)
>
>  1: (()+0xa41911) [0x560a8fa9e911]
>
>  2: (()+0xf6d0) [0x7f5c0efb86d0]
>
>  3: (gsignal()+0x37) [0x7f5c0dfd9277]
>
>  4: (abort()+0x148) [0x7f5c0dfda968]
>
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x284) [0x560a8fadd414]
>
>  6: (KernelDevice::_aio_thread()+0xd34) [0x560a8fa7fe24]
>
>  7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x560a8fa8517d]
>
>  8: (()+0x7e25) [0x7f5c0efb0e25]
>
>  9: (clone()+0x6d) [0x7f5c0e0a1bad]
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
>
>
> Aborted
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to