Hi,
currently we experience osd daemon crashes and I can't pin the issue. I
hope someone can help me with it.
* We operate multiple cluster (440 SSD - 1PB, 36 SSD - 126TB, 40SSD 100TB,
84HDD - 680TB)
* All clusters were updated around the same time (2021-02-03)
* We restarted ALL ceph daemons (systemctl restart ceph.target) on
2021-02-11 after we added OOMScoreAdjust=-900 the all service files.
now in our main cluster (440SSD with 1PB) the OSD daemons begin to crash:
# ceph crash ls
ID ENTITY NEW
2020-03-06_17:37:54.031675Z_0bbbb807-ff2f-46df-9508-58d319b89bd6 osd.397
2020-05-28_12:23:27.677741Z_061f2449-9a36-4747-a2f8-624e72cd1ad0 osd.410
2021-02-05_07:03:35.943384Z_dffab245-4788-4de2-a677-76b735d5fc01 osd.403
2021-02-15_15:41:27.934194Z_97b57f8f-58f2-4390-9d3e-993874e0e000 osd.395
2021-02-15_18:01:19.774879Z_18160e65-4659-451f-8aae-def2984f1f29 osd.178
2021-02-17_04:51:05.101052Z_9f04c6e8-d0c7-442c-9a38-33d5164d2a83 osd.384
osd.384 and osd.395 are on the same node, which had some memory issues we
fixed 2021-02-16_12:00:00
osd.384 was marked as out for >24h when the daemon crashed, and there no
more misplaced objects in the cluster.
Here is the latest crash dump
--- begin dump of recent events ---
-9999> 2021-02-17 03:31:31.305 7fcf7e136700 1 do_command 'perf dump'
'result is 30067 bytes
-9998> 2021-02-17 03:31:31.626 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9997> 2021-02-17 03:31:32.634 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9996> 2021-02-17 03:31:33.639 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9995> 2021-02-17 03:31:34.647 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9994> 2021-02-17 03:31:35.651 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9993> 2021-02-17 03:31:36.654 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9992> 2021-02-17 03:31:37.657 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9991> 2021-02-17 03:31:38.676 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9990> 2021-02-17 03:31:39.680 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9989> 2021-02-17 03:31:40.684 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9988> 2021-02-17 03:31:41.193 7fcf7e136700 1 do_command 'perf dump' '
-9987> 2021-02-17 03:31:41.193 7fcf7e136700 1 do_command 'perf dump'
'result is 30067 bytes
<snip>
-31> 2021-02-17 05:50:41.158 7fcf7e136700 1 do_command 'perf dump' '
-30> 2021-02-17 05:50:41.159 7fcf7e136700 1 do_command 'perf dump'
'result is 30070 bytes
-29> 2021-02-17 05:50:41.804 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851831808 unmapped: 987750400 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-28> 2021-02-17 05:50:42.813 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851831808 unmapped: 987750400 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-27> 2021-02-17 05:50:43.820 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-26> 2021-02-17 05:50:44.825 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-25> 2021-02-17 05:50:45.831 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-24> 2021-02-17 05:50:46.837 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-23> 2021-02-17 05:50:47.840 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-22> 2021-02-17 05:50:48.843 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-21> 2021-02-17 05:50:49.847 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-20> 2021-02-17 05:50:50.853 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-19> 2021-02-17 05:50:51.524 7fcf7e136700 1 do_command 'perf dump' '
-18> 2021-02-17 05:50:51.525 7fcf7e136700 1 do_command 'perf dump'
'result is 30070 bytes
-17> 2021-02-17 05:50:51.859 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-16> 2021-02-17 05:50:52.862 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-15> 2021-02-17 05:50:53.871 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-14> 2021-02-17 05:50:54.875 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-13> 2021-02-17 05:50:55.886 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-12> 2021-02-17 05:50:56.891 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-11> 2021-02-17 05:50:57.905 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-10> 2021-02-17 05:50:58.911 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-9> 2021-02-17 05:50:59.917 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-8> 2021-02-17 05:51:00.929 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-7> 2021-02-17 05:51:01.566 7fcf7e136700 1 do_command 'perf dump' '
-6> 2021-02-17 05:51:01.567 7fcf7e136700 1 do_command 'perf dump'
'result is 30070 bytes
-5> 2021-02-17 05:51:01.935 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-4> 2021-02-17 05:51:02.943 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-3> 2021-02-17 05:51:03.949 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851102720 unmapped: 988479488 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-2> 2021-02-17 05:51:04.967 7fcf73be6700 5 prioritycache tune_memory
target: 4294967296 mapped: 2851102720 unmapped: 988479488 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
-1> 2021-02-17 05:51:05.091 7fcf743e7700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/fastbmap_allocator_impl.h:
In function 'uint64_t AllocatorLevel02<T>::claim_free_to_right(uint64_t)
[with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread
7fcf743e7700 time 2021-02-17 05:51:04.998475
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/fastbmap_allocator_impl.h:
572: FAILED ceph_assert(available >= allocated)
ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x561c84cc2c7d]
2: (()+0x4d8e45) [0x561c84cc2e45]
3: (HybridAllocator::_add_to_tree(unsigned long, unsigned long)+0x49e)
[0x561c853167de]
4: (AvlAllocator::_release(interval_set<unsigned long, std::map<unsigned
long, unsigned long, std::less<unsigned long>,
std::allocator<std::pair<unsigned long const, unsigned long> > > >
const&)+0x60) [0x561c85310b20]
5: (HybridAllocator::release(interval_set<unsigned long, std::map<unsigned
long, unsigned long, std::less<unsigned long>,
std::allocator<std::pair<unsigned long const, unsigned long> > > >
const&)+0x3a) [0x561c853143ca]
6: (BlueStore::_txc_release_alloc(BlueStore::TransContext*)+0x5f)
[0x561c851ee83f]
7: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x1be)
[0x561c8522f4ae]
8: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0xaa)
[0x561c8522fe9a]
9: (BlueStore::_kv_finalize_thread()+0x604) [0x561c85232ed4]
10: (BlueStore::KVFinalizeThread::entry()+0xd) [0x561c852625ed]
11: (()+0x7ea5) [0x7fcf840a2ea5]
12: (clone()+0x6d) [0x7fcf82f6596d]
0> 2021-02-17 05:51:05.145 7fcf743e7700 -1 *** Caught signal (Aborted)
**
in thread 7fcf743e7700 thread_name:bstore_kv_final
ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
(stable)
1: (()+0xf630) [0x7fcf840aa630]
2: (gsignal()+0x37) [0x7fcf82e9d387]
3: (abort()+0x148) [0x7fcf82e9ea78]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x199) [0x561c84cc2ccc]
5: (()+0x4d8e45) [0x561c84cc2e45]
6: (HybridAllocator::_add_to_tree(unsigned long, unsigned long)+0x49e)
[0x561c853167de]
7: (AvlAllocator::_release(interval_set<unsigned long, std::map<unsigned
long, unsigned long, std::less<unsigned long>,
std::allocator<std::pair<unsigned long const, unsigned long> > > >
const&)+0x60) [0x561c85310b20]
8: (HybridAllocator::release(interval_set<unsigned long, std::map<unsigned
long, unsigned long, std::less<unsigned long>,
std::allocator<std::pair<unsigned long const, unsigned long> > > >
const&)+0x3a) [0x561c853143ca]
9: (BlueStore::_txc_release_alloc(BlueStore::TransContext*)+0x5f)
[0x561c851ee83f]
10: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x1be)
[0x561c8522f4ae]
11: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0xaa)
[0x561c8522fe9a]
12: (BlueStore::_kv_finalize_thread()+0x604) [0x561c85232ed4]
13: (BlueStore::KVFinalizeThread::entry()+0xd) [0x561c852625ed]
14: (()+0x7ea5) [0x7fcf840a2ea5]
15: (clone()+0x6d) [0x7fcf82f6596d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]