I have 2 OSDs failing to start due to this [1] segfault.  What is happening
matches what Sage said about this [2] bug.  The OSDs are NVMe disks and
rocksdb is compacting omaps.  I attempted setting `bluestore_bluefs_min_free
= 10737418240` and then start the OSDs, but they both segfaulted with the
same error. The segfault is immediate on OSD start happening within 5
seconds. Is there any testing that would be helpful to figuring this out
and/or get these 2 OSDs back up. All data has successfully migrated off of
them, so I'm at health OK with them marked out.


[1] FAILED assert(0 == "bluefs enospc")

[2] https://bugzilla.redhat.com/show_bug.cgi?id=1600138


On Tue, Aug 14, 2018 at 12:29 PM Igor Fedotov <ifedo...@suse.de> wrote:

> Hi Jakub,
>
> for the crashing OSD could you please set
>
> debug_bluestore=10
>
> bluestore_bluefs_balance_failure_dump_interval=1
>
>
> and collect more logs.
>
> This will hopefully provide more insight on why additional space isn't
> allocated for bluefs.
>
> Thanks,
>
> Igor
>
> On 8/14/2018 12:41 PM, Jakub Stańczak wrote:
>
> Hello All!
>
> I am using mimic full bluestore cluster with pure RGW workload. We use AWS
> i3 instance family for osd machines - each instance has 1 NVMe disk which
> is split into 4 partitions and each of those partitions is devoted to
> bluestore block device. We use 1 device per partition - so everything is
> managed by bluestore internally.
>
> The problem is that under write heavy conditions DB device is growing fast
> and at some point bluefs will stop getting more space which results in osd
> death. There is no recovery from this error - when bluefs runs out of space
> for rocksdb, osd dies and it cannot be restarted.
>
> With this particular osd there is plenty of free space but we can see that
> it cannot allocate more space under weird address '_balance_bluefs_freespace
> no allocate on 0x80000000'.
>
> I've also did some bluefs tuning cause previously I had similar problems
> but it appeared that bluestore could not keep up with providing enough
> storage for bluefs.
>
> bluefs settings:
> bluestore_bluefs_balance_interval = 0.333 bluestore_bluefs_gift_ratio =
> 0.05 bluestore_bluefs_min_free = 3221225472
>
> snippet from osd logs:
>
> 2018-08-13 18:15:10.960 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
> 2018-08-13 18:15:11.330 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
> 2018-08-13 18:15:11.752 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
> 2018-08-13 18:15:11.785 7f6a5b882700  4 rocksdb: 
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
> /db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14590: 304401 
> keys, 68804532 bytes
> 2018-08-13 18:15:11.785 7f6a5b882700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 
> 1534184111786253, "cf_name": "default", "job": 41, "event": 
> "table_file_creation", "file_number": 14590, "file_size": 68804532, 
> "table_properties": {"data_size
> ": 67112437, "index_size": 777792, "filter_size": 913252, "raw_key_size": 
> 13383306, "raw_average_key_size": 43, "raw_value_size": 58673606, 
> "raw_average_value_size": 192, "num_data_blocks": 17090, "num_entries": 
> 304401, "filter_policy_na
> me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": 
> "0"}}
> 2018-08-13 18:15:12.245 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
> 2018-08-13 18:15:12.664 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
> 2018-08-13 18:15:12.743 7f6a5b882700  4 rocksdb: 
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
> /db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14591: 313351 
> keys, 68830515 bytes
> 2018-08-13 18:15:12.743 7f6a5b882700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 
> 1534184112744129, "cf_name": "default", "job": 41, "event": 
> "table_file_creation", "file_number": 14591, "file_size": 68830515, 
> "table_properties": {"data_size
> ": 67109446, "index_size": 785852, "filter_size": 934166, "raw_key_size": 
> 13762246, "raw_average_key_size": 43, "raw_value_size": 58469928, 
> "raw_average_value_size": 186, "num_data_blocks": 17124, "num_entries": 
> 313351, "filter_policy_na
> me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": 
> "0"}}
> 2018-08-13 18:15:13.025 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
> 2018-08-13 18:15:13.405 7f6a5b882700  1 bluefs _allocate failed to allocate 
> 0x4200000 on bdev 1, free 0x3500000; fallback to bdev 2
> 2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _allocate failed to allocate 
> 0x4200000 on bdev 2, dne
> 2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _flush_range allocated: 0x0 
> offset: 0x0 length: 0x419db1f
> 2018-08-13 18:15:13.405 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x2000
> 2018-08-13 18:15:13.409 7f6a5b882700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/Blue
> FS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, 
> uint64_t)' thread 7f6a5b882700 time 2018-08-13 18:15:13.406645
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/BlueFS.cc:
>  1663: FAILED assert(0 == "bluefs
> enospc")
>
>  ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0xff) [0x7f6a6b660e1f]
>  2: (()+0x284fe7) [0x7f6a6b660fe7]
>  3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned 
> long)+0x1ac6) [0x55f6c6db9146]
>  4: (BlueRocksWritableFile::Flush()+0x3d) [0x55f6c6dcf0cd]
>  5: (rocksdb::WritableFileWriter::Flush()+0x196) [0x55f6c6faf7c6]
>  6: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x55f6c6fafa8e]
>  7: (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status 
> const&, rocksdb::CompactionJob::SubcompactionState*, 
> rocksdb::RangeDelAggregator*, CompactionIterationStats*, rocksdb::Slice 
> const*)+0x73b) [0x55f6c6fed26b]
>  8: 
> (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x77f)
>  [0x55f6c6feff3f]
>  9: (rocksdb::CompactionJob::Run()+0x2c8) [0x55f6c6ff1508]
>  10: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, 
> rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*)+0xab4) 
> [0x55f6c6e57da4]
>  11: 
> (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
>  rocksdb::Env::Priority)+0xd0) [0x55f6c6e59680]
>  12: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a) [0x55f6c6e59b6a]
>  13: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x266) 
> [0x55f6c7034536]
>  14: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f) 
> [0x55f6c70346bf]
>  15: (()+0x6ae17f) [0x7f6a6ba8a17f]
>  16: (()+0x7e25) [0x7f6a681c5e25]
>  17: (clone()+0x6d) [0x7f6a672b5bad]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
>  Has anyone stumbled upon similar problem? It looks like a bug to me - 
> happened on several OSDs already, always different size of bluefs, different 
> saturation of osd.
>
> Best Regards, Kuba Stańczak
>
>
>
> _______________________________________________
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to