Dear Mr Kefu Chai
Sorry to disturb you.
I meet a problem recently. In my ceph cluster ,health status has warning “store
is getting too big!” for several days; and ceph-mon costs nearly 100% cpu;
Have you ever met this situation?
Some detailed information are attached below:
root@cvknode17:~# ceph -s
cluster 04afba60-3a77-496c-b616-2ecb5e47e141
health HEALTH_WARN
mon.cvknode17 store is getting too big! 34104 MB >= 15360 MB
monmap e1: 3 mons at
{cvknode15=172.16.51.15:6789/0,cvknode16=172.16.51.16:6789/0,cvknode17=172.16.51.17:6789/0}
election epoch 862, quorum 0,1,2 cvknode15,cvknode16,cvknode17
osdmap e196279: 347 osds: 347 up, 347 in
pgmap v5891025: 33272 pgs, 16 pools, 26944 GB data, 6822 kobjects
65966 GB used, 579 TB / 644 TB avail
33270 active+clean
2 active+clean+scrubbing+deep
client io 840 kB/s rd, 739 kB/s wr, 35 op/s rd, 184 op/s wr
root@cvknode17:~# top
top - 15:19:28 up 23 days, 23:58, 6 users, load average: 1.08, 1.40, 1.77
Tasks: 346 total, 2 running, 342 sleeping, 0 stopped, 2 zombie
Cpu(s): 8.1%us, 10.8%sy, 0.0%ni, 69.0%id, 9.5%wa, 0.0%hi, 2.5%si, 0.0%st
Mem: 65384424k total, 58102880k used, 7281544k free, 240720k buffers
Swap: 29999100k total, 344944k used, 29654156k free, 24274272k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24407 root 20 0 17.3g 12g 10m S 98 20.2 8420:11 ceph-mon
root@cvknode17:~# top -Hp 24407
top - 15:19:49 up 23 days, 23:59, 6 users, load average: 1.12, 1.39, 1.76
Tasks: 17 total, 1 running, 16 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.1%us, 10.8%sy, 0.0%ni, 69.0%id, 9.5%wa, 0.0%hi, 2.5%si, 0.0%st
Mem: 65384424k total, 58104868k used, 7279556k free, 240744k buffers
Swap: 29999100k total, 344944k used, 29654156k free, 24271188k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25931 root 20 0 17.3g 12g 9m R 98 20.2 7957:37 ceph-mon
24514 root 20 0 17.3g 12g 9m S 2 20.2 3:06.75 ceph-mon
25932 root 20 0 17.3g 12g 9m S 2 20.2 1:07.82 ceph-mon
24407 root 20 0 17.3g 12g 9m S 0 20.2 0:00.67 ceph-mon
24508 root 20 0 17.3g 12g 9m S 0 20.2 15:50.24 ceph-mon
24513 root 20 0 17.3g 12g 9m S 0 20.2 0:07.88 ceph-mon
24534 root 20 0 17.3g 12g 9m S 0 20.2 196:33.85 ceph-mon
24535 root 20 0 17.3g 12g 9m S 0 20.2 0:00.01 ceph-mon
25929 root 20 0 17.3g 12g 9m S 0 20.2 3:06.09 ceph-mon
25930 root 20 0 17.3g 12g 9m S 0 20.2 8:12.58 ceph-mon
25933 root 20 0 17.3g 12g 9m S 0 20.2 4:42.22 ceph-mon
25934 root 20 0 17.3g 12g 9m S 0 20.2 40:53.27 ceph-mon
25935 root 20 0 17.3g 12g 9m S 0 20.2 0:04.84 ceph-mon
25936 root 20 0 17.3g 12g 9m S 0 20.2 0:00.01 ceph-mon
25980 root 20 0 17.3g 12g 9m S 0 20.2 0:06.65 ceph-mon
25986 root 20 0 17.3g 12g 9m S 0 20.2 48:26.77 ceph-mon
55738 root 20 0 17.3g 12g 9m S 0 20.2 0:09.06 ceph-mon
Thread 20 (Thread 0x7f3e77e80700 (LWP 25931)):
#0 0x00007f3e7e83a653 in pread64 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000009286cf in ?? ()
#2 0x000000000092c187 in leveldb::ReadBlock(leveldb::RandomAccessFile*,
leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::Block**) ()
#3 0x0000000000922f41 in leveldb::Table::BlockReader(void*,
leveldb::ReadOptions const&, leveldb::Slice const&) ()
#4 0x0000000000924840 in ?? ()
#5 0x0000000000924b39 in ?? ()
#6 0x0000000000924a7a in ?? ()
#7 0x00000000009227d0 in ?? ()
#8 0x00000000009140b6 in ?? ()
#9 0x00000000009143dd in ?? ()
#10 0x000000000088d399 in
LevelDBStore::LevelDBWholeSpaceIteratorImpl::lower_bound(std::string const&,
std::string const&) ()
#11 0x000000000088bf00 in LevelDBStore::get(std::string const&,
std::set<std::string, std::less<std::string>, std::allocator<std::string> >
const&, std::map<std::string, ceph::buffer::list, std::less<std::string>,
std::allocator<std::pair<std::string const, ceph::buffer::list> > >*) ()
#12 0x000000000056a7a2 in MonitorDBStore::get(std::string const&, std::string
const&) ()
---Type <return> to continue, or q <return> to quit---
#13 0x00000000005dcf61 in PaxosService::refresh(bool*) ()
#14 0x000000000058a76b in Monitor::refresh_from_paxos(bool*) ()
#15 0x00000000005c55ac in Paxos::do_refresh() ()
#16 0x00000000005cc093 in Paxos::handle_commit(MMonPaxos*) ()
#17 0x00000000005d4d8b in Paxos::dispatch(PaxosServiceMessage*) ()
#18 0x00000000005ac204 in Monitor::dispatch(MonSession*, Message*, bool) ()
#19 0x00000000005a9b09 in Monitor::_ms_dispatch(Message*) ()
#20 0x00000000005c48a2 in Monitor::ms_dispatch(Message*) ()
#21 0x00000000008b2e67 in Messenger::ms_deliver_dispatch(Message*) ()
#22 0x00000000008b000a in DispatchQueue::entry() ()
#23 0x00000000007a069d in DispatchQueue::DispatchThread::entry() ()
#24 0x00007f3e7e832e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#25 0x00007f3e7cff638d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#26 0x0000000000000000 in ?? ()
Thanks
Best regards
-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C,
which is
intended only for the person or entity whose address is listed above. Any use
of the
information contained herein in any way (including, but not limited to, total
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender
by phone or email immediately and delete it!
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com