Hi Reed,

no much sense to attach the logs to the mentioned tickets - the problem with the assertion is well-known and has been already fixed.

Your current issue is weird config update behavior which prevents from applying the work around. Feel free to open ticket about that but I don't think it's an efficient way - IIUC the problem isn't common and likely caused by something specific to your setup. Which rather means the fix wouldn't appear soon enough. Unfortunately that's not my area of expertise either so I'm of little help here as well.

Nevertheless If I troubleshoot  this config update issue I'd start the investigation by trying different parameters/daemons/hosts. Are you able to tune any parameter at all? Is it doable at different host or OSD?

Not to mention that you might just try to restart monitors first ;)


Thanks,

Igor

On 10/01/2024 21:38, Reed Dier wrote:
Hi Igor,

That’s correct (shown below).
Would it be helpful for me to add logs/uploaded crash UUID’s to 53906 <https://tracker.ceph.com/issues/53906>, 53907 <https://tracker.ceph.com/issues/53907>, 54209 <https://tracker.ceph.com/issues/54209>, 62928 <https://tracker.ceph.com/issues/62928>, 63110 <https://tracker.ceph.com/issues/63110>, 63161 <https://tracker.ceph.com/issues/63161>, 63352 <https://tracker.ceph.com/issues/63352>? Or maybe open a new tracker to track that the parameter change isn’t being properly persisted or whatever appears to be happening?

Thanks,
Reed

/build/ceph-16.2.14/src/os/bluestore/BlueStore.h: 3870: FAILED ceph_assert(cur >= p.length)

 ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable)  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55d51970a987]
 2: /usr/bin/ceph-osd(+0xad3b8f) [0x55d51970ab8f]
 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x112) [0x55d519e040f2]  4: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x69d) [0x55d519ea0fad]  5: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xaa) [0x55d519ea14ea]
 6: (BlueFS::fsync(BlueFS::FileWriter*)+0x7d) [0x55d519ec61ed]
 7: (BlueRocksWritableFile::Sync()+0x19) [0x55d519ed5a59]
 8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x52) [0x55d51a3e37ce]  9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x216) [0x55d51a5eddac]
 10: (rocksdb::WritableFileWriter::Sync(bool)+0x17b) [0x55d51a5ed785]
 11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x39a) [0x55d51a441bf8]  12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x135e) [0x55d51a43d96c]  13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x5d) [0x55d51a43c56f]  14: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x85) [0x55d51a388635]  15: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9b) [0x55d51a38904b]
 16: (BlueStore::_kv_sync_thread()+0x22bc) [0x55d519e016dc]
 17: (BlueStore::KVSyncThread::entry()+0x11) [0x55d519e2de71]
 18: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f490cf23609]
 19: clone()

     0> 2024-01-10T11:39:05.922-0500 7f48f978d700 -1 *** Caught signal (Aborted) **
 in thread 7f48f978d700 thread_name:bstore_kv_sync

 ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f490cf2f420]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ad) [0x55d51970a9e2]
 5: /usr/bin/ceph-osd(+0xad3b8f) [0x55d51970ab8f]
 6: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x112) [0x55d519e040f2]  7: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x69d) [0x55d519ea0fad]  8: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xaa) [0x55d519ea14ea]
 9: (BlueFS::fsync(BlueFS::FileWriter*)+0x7d) [0x55d519ec61ed]
 10: (BlueRocksWritableFile::Sync()+0x19) [0x55d519ed5a59]
 11: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x52) [0x55d51a3e37ce]  12: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x216) [0x55d51a5eddac]
 13: (rocksdb::WritableFileWriter::Sync(bool)+0x17b) [0x55d51a5ed785]
 14: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x39a) [0x55d51a441bf8]  15: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x135e) [0x55d51a43d96c]  16: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x5d) [0x55d51a43c56f]  17: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x85) [0x55d51a388635]  18: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9b) [0x55d51a38904b]
 19: (BlueStore::_kv_sync_thread()+0x22bc) [0x55d519e016dc]
 20: (BlueStore::KVSyncThread::entry()+0x11) [0x55d519e2de71]
 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f490cf23609]
 22: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

On Jan 10, 2024, at 12:06 PM, Igor Fedotov <[email protected]> wrote:

Hi Reed,

it looks to me like your settings aren't effective. You might want to check OSD log rather than crash info and see the assertion's backtrace.

Does it mention RocksDBBlueFSVolumeSelector as the one in https://tracker.ceph.com/issues/53906:

ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) 
quincy (dev)
  1: /lib64/libpthread.so.0(+0x12c20) [0x7f2beb318c20]
  2: gsignal()
  3: abort()
  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x1b0) [0x56347eb33bec]
  5: /usr/bin/ceph-osd(+0x5d5daf) [0x56347eb33daf]
  6: (RocksDBBlueFSVolumeSelector::add_usage(void*, bluefs_fnode_t const&)+0) 
[0x56347f1f7d00]
  7: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned 
long)+0x735) [0x56347f295b45]


If so - then there is still a mess with proper parameter changes.

Thanks
Igor

On 10/01/2024 20:13, Reed Dier wrote:
Well, sadly, that setting doesn’t seem to resolve the issue.

I set the value in ceph.conf for the OSDs with small WAL/DB devices that keep 
running into the issue,

$  ceph tell osd.12 config show | grep bluestore_volume_selection_policy
     "bluestore_volume_selection_policy": "rocksdb_original",
$ ceph crash info 2024-01-10T16:39:05.925534Z_f0c57ca3-b7e6-4511-b7ae-5834541d6c67 | 
egrep "(assert_condition|entity_name)"
     "assert_condition": "cur >= p.length",
     "entity_name": "osd.12",
So, I guess that configuration item doesn’t in fact prevent the crash as was 
purported.
Looks like I may need to fast track moving to quincy…

Reed
_______________________________________________
ceph-users mailing list [email protected]
To unsubscribe send an email [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to