We've been doing some performance testing on Bluestore to see whether it could 
be viable to use in the future.
The good news we are seeing significant performance improvements on using it, 
so thank you for all the work that has gone into it.
The bad news is we keep encountering crashes and corruption requiring the 
rebuild: Example log extract looks like the following:
2016-09-03 17:56:57.337756 7f593fe9b700 -1 freelist release bad release 
564521787392~4096 overlaps with 564521787392~40962016-09-03 17:56:57.340169 
7f593fe9b700 -1 os/bluestore/FreelistManager.cc: In function 'int 
FreelistManager::release(uint64_t, uint64_t, KeyValueDB::Transaction)' thread 
7f593fe9b700 time 2016-09-03 17:56:57.338393os/bluestore/FreelistManager.cc: 
245: FAILED assert(0 == "bad release overlap")
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) 
[0x7f59643945b5] 2: (FreelistManager::release(unsigned long, unsigned long, 
std::shared_ptr<KeyValueDB::TransactionImpl>)+0x533) [0x7f596402c963] 3: 
(BlueStore::_txc_update_fm(BlueStore::TransContext*)+0x317) [0x7f5963fda0d7] 4: 
(BlueStore::_kv_sync_thread()+0x9b0) [0x7f5964003690] 5: 
(BlueStore::KVSyncThread::entry()+0xd) [0x7f59640299dd] 6: (()+0x7dc5) 
[0x7f59622c4dc5] 7: (clone()+0x6d) [0x7f596095021d] NOTE: a copy of the 
executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ----10000> 2016-09-03 17:43:23.940910 
7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 70561038 len 23 crc 
696327790 -9999> 2016-09-03 17:43:23.965781 7f593fe9b700  5 rocksdb: 
EmitPhysicalRecord: log 37 offset 70561061 len 2178 crc 2109984297 -9998> 
2016-09-03 17:43:23.965833 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 
offset 70563239 len 2175 crc 493419836 -9997> 2016-09-03 17:43:23.965867 
7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 70565414 len 23 crc 
1766806723...

This appears to match this existing bug: http://tracker.ceph.com/issues/15659
Are there any know work-arounds to prevent the issue from happening? What 
information and support can we provide to help the fix on the issue to be 
progressed? We seem to be able to reliably reproduce the issue after about 6 
hours or so of test running so would be able to test any proposed fixes if that 
would be helpful,
Thanks,
Thomas
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to