These are LVM bluestore NVMe SSDs created with "ceph-volume --lvm prepare
--bluestore /dev/nvme0n1p3" i.e. without specifying wal/db devices.
OSDs were created with bluestore_min_alloc_size_ssd=4096, another modified
setting is bluestore_cache_kv_max=1073741824
DB/block usage collected by prometheus module for 3 failed and 1 survived OSDs:
ceph_bluefs_db_total_bytes{ceph_daemon="osd.0"} 65493008384.0
ceph_bluefs_db_total_bytes{ceph_daemon="osd.1"} 49013587968.0
ceph_bluefs_db_total_bytes{ceph_daemon="osd.2"} 76834406400.0 --> this one has
survived
ceph_bluefs_db_total_bytes{ceph_daemon="osd.3"} 63726157824.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.0"} 65217232896.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.1"} 48944381952.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.2"} 68093476864.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.3"} 63632834560.0
ceph_osd_stat_bytes{ceph_daemon="osd.0"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.1"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.2"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.3"} 471305551872.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.0"} 222328213504.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.1"} 214472544256.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.2"} 163603996672.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.3"} 212806815744.0
First crashed OSD was doing DB compaction, others crashed shortly after during
backfilling. Workload was "ceph-data-scan scan_inodes" filling metadata pool
located on these OSDs at the rate close to 10k objects/second.
Here is the log excerpt of the first crash occurrence:
2018-10-01 03:27:12.762 7fbf16dd6700 0 bluestore(/var/lib/ceph/osd/ceph-1)
_balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size 0x1000
2018-10-01 03:27:12.886 7fbf1e5e5700 4 rocksdb:
[/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] [JOB 24]
Generated table #89741: 106356 keys, 68110589 bytes
2018-10-01 03:27:12.886 7fbf1e5e5700 4 rocksdb: EVENT_LOG_v1 {"time_micros":
1538353632892744, "cf_name": "default", "job": 24, "event":
"table_file_creation", "file_number": 89741, "file_size": 68110589,
"table_properties": {"data_size": 67112903, "index_size": 579319,
"filter_size": 417316, "raw_key_size": 6733561, "raw_average_key_size": 63,
"raw_value_size": 60994583, "raw_average_value_size": 573, "num_data_blocks":
16336, "num_entries": 106356, "filter_policy_name":
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "14444", "kMergeOperands": "0"}}
2018-10-01 03:27:12.934 7fbf1e5e5700 4 rocksdb:
[/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] [JOB 24]
Generated table #89742: 23214 keys, 16352315 bytes
2018-10-01 03:27:12.934 7fbf1e5e5700 4 rocksdb: EVENT_LOG_v1 {"time_micros":
1538353632938670, "cf_name": "default", "job": 24, "event":
"table_file_creation", "file_number": 89742, "file_size": 16352315,
"table_properties": {"data_size": 16116986, "index_size": 139894,
"filter_size": 94386, "raw_key_size": 1470883, "raw_average_key_size": 63,
"raw_value_size": 14775006, "raw_average_value_size": 636, "num_data_blocks":
3928, "num_entries": 23214, "filter_policy_name": "rocksdb.BuiltinBloomFilter",
"kDeletedKeys": "90", "kMergeOperands": "0"}}
2018-10-01 03:27:13.042 7fbf1e5e5700 1 bluefs _allocate failed to allocate
0x4100000 on bdev 1, free 0x1a00000; fallback to bdev 2
2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _allocate failed to allocate
0x4100000 on bdev 2, dne
2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _flush_range allocated: 0x0
offset: 0x0 length: 0x40ea9f1
2018-10-01 03:27:13.046 7fbf1e5e5700 -1
/build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In function 'int
BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread
7fbf1e5e5700 time 2018-10-01 03:27:13.048298
/build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: 1663: FAILED assert(0 == "bluefs
enospc")
ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7fbf2d4fe5c2]
2: (()+0x26c787) [0x7fbf2d4fe787]
3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned
long)+0x1ab4) [0x5619325114b4]
4: (BlueRocksWritableFile::Flush()+0x3d) [0x561932527c1d]
5: (rocksdb::WritableFileWriter::Flush()+0x1b9) [0x56193271c399]
6: (rocksdb::WritableFileWriter::Sync(bool)+0x3b) [0x56193271d42b]
7: (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status const&,
rocksdb::CompactionJob::SubcompactionState*, rocksdb::RangeDelAggregator*,
CompactionIterationStats*, rocksdb::Slice const*)+0x3db) [0x56193276098b]
8:
(rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d9)
[0x561932763da9]
9: (rocksdb::CompactionJob::Run()+0x314) [0x561932765504]
10: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*,
rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*)+0xc54)
[0x5619325b5c44]
11:
(rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
rocksdb::Env::Priority)+0x397) [0x5619325b8557]
12: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x97) [0x5619325b8cd7]
13: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x266)
[0x5619327a5e36]
14: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x47)
[0x5619327a5fb7]
15: (()+0xbe733) [0x7fbf2b500733]
16: (()+0x76db) [0x7fbf2bbf86db]
17: (clone()+0x3f) [0x7fbf2abbc88f]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
> On 1.10.2018, at 15:01, Igor Fedotov <[email protected]> wrote:
>
> Hi Sergey,
>
> could you please provide more details on your OSDs ?
>
> What are sizes for DB/block devices?
>
> Do you have any modifications in BlueStore config settings?
>
> Can you share stats you're referring to?
>
>
> Thanks,
>
> Igor
>
>
> On 10/1/2018 12:29 PM, Sergey Malinin wrote:
>> Hello,
>> 3 of 4 NVME OSDs crashed at the same time on assert(0 == "bluefs enospc")
>> and no longer start.
>> Stats collected just before crash show that ceph_bluefs_db_used_bytes is
>> 100% used. Although OSDs have over 50% of free space, it is not reallocated
>> for DB usage.
>>
>> 2018-10-01 12:18:06.744 7f1d6a04d240 1 bluefs _allocate failed to allocate
>> 0x100000 on bdev 1, free 0x0; fallback to bdev 2
>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs _allocate failed to allocate
>> 0x100000 on bdev 2, dne
>> 2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs _flush_range allocated: 0x0
>> offset: 0x0 length: 0xa8700
>> 2018-10-01 12:18:06.748 7f1d6a04d240 -1
>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In function 'int
>> BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread
>> 7f1d6a04d240 time 2018-10-01 12:18:06.746800
>> /build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: 1663: FAILED assert(0 ==
>> "bluefs enospc")
>>
>> ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic
>> (stable)
>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x7f1d6146f5c2]
>> 2: (()+0x26c787) [0x7f1d6146f787]
>> 3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned
>> long)+0x1ab4) [0x5586b22684b4]
>> 4: (BlueRocksWritableFile::Flush()+0x3d) [0x5586b227ec1d]
>> 5: (rocksdb::WritableFileWriter::Flush()+0x1b9) [0x5586b2473399]
>> 6: (rocksdb::WritableFileWriter::Sync(bool)+0x3b) [0x5586b247442b]
>> 7: (rocksdb::BuildTable(std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*,
>> rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&,
>> rocksdb::EnvOptions const&, rock
>> sdb::TableCache*, rocksdb::InternalIterator*,
>> std::unique_ptr<rocksdb::InternalIterator,
>> std::default_delete<rocksdb::InternalIterator> >, rocksdb::FileMetaData*,
>> rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<
>> rocksdb::IntTblPropCollectorFactory,
>> std::default_delete<rocksdb::IntTblPropCollectorFactory> >,
>> std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory,
>> std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > co
>> nst*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > const&, std::vector<unsigned long,
>> std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*,
>> rocksdb::Compression
>> Type, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*,
>> rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int,
>> rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long,
>> unsigned long, rocksdb
>> ::Env::WriteLifeTimeHint)+0x1e24) [0x5586b249ef94]
>> 8: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int,
>> rocksdb::ColumnFamilyData*, rocksdb::MemTable*,
>> rocksdb::VersionEdit*)+0xcb7) [0x5586b2321457]
>> 9: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long,
>> std::allocator<unsigned long> > const&, unsigned long*, bool)+0x19de)
>> [0x5586b232373e]
>> 10: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor,
>> std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool,
>> bool)+0x5d4) [0x5586b23242f4]
>> 11: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&,
>> std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor,
>> std::allocator<rocksdb::ColumnFamilyDescri
>> ptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*,
>> std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool)+0x68b)
>> [0x5586b232559b]
>> 12: (rocksdb::DB::Open(rocksdb::DBOptions const&,
>> std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor,
>> std::allocator<rocksdb::ColumnFamilyDescriptor
>>>> const&, std::vector<rocksdb::ColumnFamilyHandle*,
>>>> std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x22)
>>>> [0x5586b2326e72]
>> 13: (RocksDBStore::do_open(std::ostream&, bool,
>> std::vector<KeyValueDB::ColumnFamily,
>> std::allocator<KeyValueDB::ColumnFamily> > const*)+0x170c) [0x5586b220219c]
>> 14: (BlueStore::_open_db(bool, bool)+0xd8e) [0x5586b218ee1e]
>> 15: (BlueStore::_mount(bool, bool)+0x4b7) [0x5586b21bf807]
>> 16: (OSD::init()+0x295) [0x5586b1d673c5]
>> 17: (main()+0x268d) [0x5586b1c554ed]
>> 18: (__libc_start_main()+0xe7) [0x7f1d5ea2db97]
>> 19: (_start()+0x2a) [0x5586b1d1d7fa]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>> interpret this.
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com