I have 2 OSDs failing to start due to this [1] segfault. What is happening matches what Sage said about this [2] bug. The OSDs are NVMe disks and rocksdb is compacting omaps. I attempted setting `bluestore_bluefs_min_free = 10737418240` and then start the OSDs, but they both segfaulted with the same error. The segfault is immediate on OSD start happening within 5 seconds. Is there any testing that would be helpful to figuring this out and/or get these 2 OSDs back up. All data has successfully migrated off of them, so I'm at health OK with them marked out.
[1] FAILED assert(0 == "bluefs enospc") [2] https://bugzilla.redhat.com/show_bug.cgi?id=1600138 On Tue, Aug 14, 2018 at 12:29 PM Igor Fedotov <ifedo...@suse.de> wrote: > Hi Jakub, > > for the crashing OSD could you please set > > debug_bluestore=10 > > bluestore_bluefs_balance_failure_dump_interval=1 > > > and collect more logs. > > This will hopefully provide more insight on why additional space isn't > allocated for bluefs. > > Thanks, > > Igor > > On 8/14/2018 12:41 PM, Jakub Stańczak wrote: > > Hello All! > > I am using mimic full bluestore cluster with pure RGW workload. We use AWS > i3 instance family for osd machines - each instance has 1 NVMe disk which > is split into 4 partitions and each of those partitions is devoted to > bluestore block device. We use 1 device per partition - so everything is > managed by bluestore internally. > > The problem is that under write heavy conditions DB device is growing fast > and at some point bluefs will stop getting more space which results in osd > death. There is no recovery from this error - when bluefs runs out of space > for rocksdb, osd dies and it cannot be restarted. > > With this particular osd there is plenty of free space but we can see that > it cannot allocate more space under weird address '_balance_bluefs_freespace > no allocate on 0x80000000'. > > I've also did some bluefs tuning cause previously I had similar problems > but it appeared that bluestore could not keep up with providing enough > storage for bluefs. > > bluefs settings: > bluestore_bluefs_balance_interval = 0.333 bluestore_bluefs_gift_ratio = > 0.05 bluestore_bluefs_min_free = 3221225472 > > snippet from osd logs: > > 2018-08-13 18:15:10.960 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) > _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000 > 2018-08-13 18:15:11.330 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) > _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000 > 2018-08-13 18:15:11.752 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) > _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000 > 2018-08-13 18:15:11.785 7f6a5b882700 4 rocksdb: > [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb > /db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14590: 304401 > keys, 68804532 bytes > 2018-08-13 18:15:11.785 7f6a5b882700 4 rocksdb: EVENT_LOG_v1 {"time_micros": > 1534184111786253, "cf_name": "default", "job": 41, "event": > "table_file_creation", "file_number": 14590, "file_size": 68804532, > "table_properties": {"data_size > ": 67112437, "index_size": 777792, "filter_size": 913252, "raw_key_size": > 13383306, "raw_average_key_size": 43, "raw_value_size": 58673606, > "raw_average_value_size": 192, "num_data_blocks": 17090, "num_entries": > 304401, "filter_policy_na > me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": > "0"}} > 2018-08-13 18:15:12.245 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) > _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000 > 2018-08-13 18:15:12.664 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) > _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000 > 2018-08-13 18:15:12.743 7f6a5b882700 4 rocksdb: > [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb > /db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14591: 313351 > keys, 68830515 bytes > 2018-08-13 18:15:12.743 7f6a5b882700 4 rocksdb: EVENT_LOG_v1 {"time_micros": > 1534184112744129, "cf_name": "default", "job": 41, "event": > "table_file_creation", "file_number": 14591, "file_size": 68830515, > "table_properties": {"data_size > ": 67109446, "index_size": 785852, "filter_size": 934166, "raw_key_size": > 13762246, "raw_average_key_size": 43, "raw_value_size": 58469928, > "raw_average_value_size": 186, "num_data_blocks": 17124, "num_entries": > 313351, "filter_policy_na > me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": > "0"}} > 2018-08-13 18:15:13.025 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) > _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000 > 2018-08-13 18:15:13.405 7f6a5b882700 1 bluefs _allocate failed to allocate > 0x4200000 on bdev 1, free 0x3500000; fallback to bdev 2 > 2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _allocate failed to allocate > 0x4200000 on bdev 2, dne > 2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _flush_range allocated: 0x0 > offset: 0x0 length: 0x419db1f > 2018-08-13 18:15:13.405 7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) > _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000 > 2018-08-13 18:15:13.409 7f6a5b882700 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/Blue > FS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, > uint64_t)' thread 7f6a5b882700 time 2018-08-13 18:15:13.406645 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/BlueFS.cc: > 1663: FAILED assert(0 == "bluefs > enospc") > > ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0xff) [0x7f6a6b660e1f] > 2: (()+0x284fe7) [0x7f6a6b660fe7] > 3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned > long)+0x1ac6) [0x55f6c6db9146] > 4: (BlueRocksWritableFile::Flush()+0x3d) [0x55f6c6dcf0cd] > 5: (rocksdb::WritableFileWriter::Flush()+0x196) [0x55f6c6faf7c6] > 6: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x55f6c6fafa8e] > 7: (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status > const&, rocksdb::CompactionJob::SubcompactionState*, > rocksdb::RangeDelAggregator*, CompactionIterationStats*, rocksdb::Slice > const*)+0x73b) [0x55f6c6fed26b] > 8: > (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x77f) > [0x55f6c6feff3f] > 9: (rocksdb::CompactionJob::Run()+0x2c8) [0x55f6c6ff1508] > 10: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, > rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*)+0xab4) > [0x55f6c6e57da4] > 11: > (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, > rocksdb::Env::Priority)+0xd0) [0x55f6c6e59680] > 12: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a) [0x55f6c6e59b6a] > 13: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x266) > [0x55f6c7034536] > 14: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f) > [0x55f6c70346bf] > 15: (()+0x6ae17f) [0x7f6a6ba8a17f] > 16: (()+0x7e25) [0x7f6a681c5e25] > 17: (clone()+0x6d) [0x7f6a672b5bad] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > Has anyone stumbled upon similar problem? It looks like a bug to me - > happened on several OSDs already, always different size of bluefs, different > saturation of osd. > > Best Regards, Kuba Stańczak > > > > _______________________________________________ > ceph-users mailing > listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com