Sage Weil writes:
> What happens if you do
> ceph-kvstore-tool rocksdb /mnt/ceph/db stats
(I'm afraid that our ceph-kvstore-tool doesn't know about a "stats"
command; but it still tries to open the database.)
That aborts after complaining about many missing files in /mnt/ceph/db.
When I ( cd /mnt/ceph/db && sudo ln -s ../db.slow/* . ) and re-run,
it still aborts, just without complaining about missing files.
I'm attaching the output (stdout+stderr combined), in case that helps.
> or, if htat works,
> ceph-kvstore-tool rocksdb /mnt/ceph/db compact
> It looks like bluefs is happy (in that it can read the whole set
> of rocksdb files), so the questoin is if rocksdb can open them, or
> if there's some corruption or problem at the rocksdb level.
> The original crash is actually here:
> ...
> 9: (tc_new()+0x283) [0x7fbdbed8e943]
> 10: (std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const*,
> unsigned long)+0x69) [0x5600b1268109]
> 11: (std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >::_M_append(char const*, unsigned long)+0x63)
> [0x5600b12f5b43]
> 12: (rocksdb::BlockBuilder::Add(rocksdb::Slice const&, rocksdb::Slice
> const&, rocksdb::Slice const*)+0x10b) [0x5600b1eaca9b]
> ...
> where tc_new is (I think) tcmalloc. Which looks to me like rocksdb
> is probably trying to allocate something very big. The question is will
> that happen with the exported files or only on bluefs...
Yes, that's what I was thinking as well. The server seems to have about
50GB of free RAM though, so maybe it was more like <UNDEFINED>ly big :-)
Also, your ceph-kvstore-tool command seems to have crashed somewhere
else (the desctructor of a rocksdb::Version object?)
2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families:
[default]
Unrecognized command: stats
ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356:
rocksdb::Version::~Version(): Assertion `path_id <
cfd_->ioptions()->cf_paths.size()' failed.
*** Caught signal (Aborted) **
in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)
1: (()+0x12890) [0x7f7240c6f890]
2: (gsignal()+0xc7) [0x7f723fb5fe97]
3: (abort()+0x141) [0x7f723fb61801]
4: (()+0x3039a) [0x7f723fb5139a]
5: (()+0x30412) [0x7f723fb51412]
6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
17: (main()+0x307) [0x5597490b5fb7]
18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
19: (_start()+0x2a) [0x55974918e03a]
2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) **
in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
> Thanks!
Thanks so much for looking into this!
We hope that we can get some access to S3 bucket indexes back, possibly
by somehow dropping and re-creating those indexes.
--
Simon.
2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families:
[default]
Unrecognized command: stats
ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356:
rocksdb::Version::~Version(): Assertion `path_id <
cfd_->ioptions()->cf_paths.size()' failed.
*** Caught signal (Aborted) **
in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)
1: (()+0x12890) [0x7f7240c6f890]
2: (gsignal()+0xc7) [0x7f723fb5fe97]
3: (abort()+0x141) [0x7f723fb61801]
4: (()+0x3039a) [0x7f723fb5139a]
5: (()+0x30412) [0x7f723fb51412]
6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
17: (main()+0x307) [0x5597490b5fb7]
18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
19: (_start()+0x2a) [0x55974918e03a]
2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) **
in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)
1: (()+0x12890) [0x7f7240c6f890]
2: (gsignal()+0xc7) [0x7f723fb5fe97]
3: (abort()+0x141) [0x7f723fb61801]
4: (()+0x3039a) [0x7f723fb5139a]
5: (()+0x30412) [0x7f723fb51412]
6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
17: (main()+0x307) [0x5597490b5fb7]
18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
19: (_start()+0x2a) [0x55974918e03a]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- begin dump of recent events ---
-23> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command assert hook 0x55974ac02130
-22> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command abort hook 0x55974ac02130
-21> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perfcounters_dump hook 0x55974ac02130
-20> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command 1 hook 0x55974ac02130
-19> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf dump hook 0x55974ac02130
-18> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perfcounters_schema hook 0x55974ac02130
-17> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf histogram dump hook 0x55974ac02130
-16> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command 2 hook 0x55974ac02130
-15> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf schema hook 0x55974ac02130
-14> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf histogram schema hook 0x55974ac02130
-13> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf reset hook 0x55974ac02130
-12> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config show hook 0x55974ac02130
-11> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config help hook 0x55974ac02130
-10> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config set hook 0x55974ac02130
-9> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config unset hook 0x55974ac02130
-8> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config get hook 0x55974ac02130
-7> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config diff hook 0x55974ac02130
-6> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config diff get hook 0x55974ac02130
-5> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command log flush hook 0x55974ac02130
-4> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command log dump hook 0x55974ac02130
-3> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command log reopen hook 0x55974ac02130
-2> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command dump_mempools hook 0x55974ba7c068
-1> 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column
families: [default]
0> 2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) **
in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)
1: (()+0x12890) [0x7f7240c6f890]
2: (gsignal()+0xc7) [0x7f723fb5fe97]
3: (abort()+0x141) [0x7f723fb61801]
4: (()+0x3039a) [0x7f723fb5139a]
5: (()+0x30412) [0x7f723fb51412]
6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
17: (main()+0x307) [0x5597490b5fb7]
18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
19: (_start()+0x2a) [0x55974918e03a]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
2/ 2 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
1/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
-2/-2 (syslog threshold)
99/99 (stderr threshold)
max_recent 500
max_new 1000
log_file
--- end dump of recent events ---
--- begin dump of recent events ---
-23> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command assert hook 0x55974ac02130
-22> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command abort hook 0x55974ac02130
-21> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perfcounters_dump hook 0x55974ac02130
-20> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command 1 hook 0x55974ac02130
-19> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf dump hook 0x55974ac02130
-18> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perfcounters_schema hook 0x55974ac02130
-17> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf histogram dump hook 0x55974ac02130
-16> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command 2 hook 0x55974ac02130
-15> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf schema hook 0x55974ac02130
-14> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf histogram schema hook 0x55974ac02130
-13> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command perf reset hook 0x55974ac02130
-12> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config show hook 0x55974ac02130
-11> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config help hook 0x55974ac02130
-10> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config set hook 0x55974ac02130
-9> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config unset hook 0x55974ac02130
-8> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config get hook 0x55974ac02130
-7> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config diff hook 0x55974ac02130
-6> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command config diff get hook 0x55974ac02130
-5> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command log flush hook 0x55974ac02130
-4> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command log dump hook 0x55974ac02130
-3> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command log reopen hook 0x55974ac02130
-2> 2019-06-12 23:40:43.531 7f724b27f0c0 5 asok(0x55974af78000)
register_command dump_mempools hook 0x55974ba7c068
-1> 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column
families: [default]
0> 2019-06-12 23:40:51.363 7f724b27f0c0 -1 *** Caught signal (Aborted) **
in thread 7f724b27f0c0 thread_name:ceph-kvstore-to
ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)
1: (()+0x12890) [0x7f7240c6f890]
2: (gsignal()+0xc7) [0x7f723fb5fe97]
3: (abort()+0x141) [0x7f723fb61801]
4: (()+0x3039a) [0x7f723fb5139a]
5: (()+0x30412) [0x7f723fb51412]
6: (rocksdb::Version::~Version()+0x224) [0x559749529fe4]
7: (rocksdb::Version::Unref()+0x35) [0x55974952a065]
8: (rocksdb::SuperVersion::Cleanup()+0x68) [0x55974960f328]
9: (rocksdb::ColumnFamilyData::~ColumnFamilyData()+0xf4) [0x5597496123d4]
10: (rocksdb::ColumnFamilySet::~ColumnFamilySet()+0xb8) [0x559749612ba8]
11: (rocksdb::VersionSet::~VersionSet()+0x4d) [0x55974951da5d]
12: (rocksdb::DBImpl::CloseHelper()+0x6a8) [0x55974944a868]
13: (rocksdb::DBImpl::~DBImpl()+0x65b) [0x559749455deb]
14: (rocksdb::DBImpl::~DBImpl()+0x11) [0x559749455e21]
15: (RocksDBStore::~RocksDBStore()+0xe9) [0x559749265349]
16: (RocksDBStore::~RocksDBStore()+0x9) [0x559749265599]
17: (main()+0x307) [0x5597490b5fb7]
18: (__libc_start_main()+0xe7) [0x7f723fb42b97]
19: (_start()+0x2a) [0x55974918e03a]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
2/ 2 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
1/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
-2/-2 (syslog threshold)
99/99 (stderr threshold)
max_recent 500
max_new 1000
log_file
/var/lib/ceph/crash/2019-06-12_21:40:51.369265Z_0eea9b49-ec97-4654-aee5-89d9207df79a/log
--- end dump of recent events ---
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com