I was experimenting with using bluestore OSDs and appear to have found a fairly
consistent way to crash them…
Changing the number of copies in a pool down from 3 to 1 has now twice caused
the mass panic of a whole pool of OSDs. In one case it was a cache tier, in
another case it was just a pool hosting rbd images.
From the log file of one of the OSDs:
2016-04-05 12:09:54.272475 7f5a58027700 0 bluestore(/var/lib/ceph/osd/ceph-43)
error (39) Directory not empty not handled on operation 21 (op 1, counting
from 0)
2016-04-05 12:09:54.272489 7f5a58027700 0 bluestore(/var/lib/ceph/osd/ceph-43)
transaction dump:
{
"ops": [
{
"op_num": 0,
"op_name": "remove",
"collection": "2.354_head",
"oid": "#2:2ac00000::::head#"
},
{
"op_num": 1,
"op_name": "rmcoll",
"collection": "2.354_head"
}
]
}
2016-04-05 12:09:54.275114 7f5a58027700 -1 os/bluestore/BlueStore.cc: In
function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)' thread 7f5a58027700 time 2016-04-05 12:09:54.272532
os/bluestore/BlueStore.cc: 4357: FAILED assert(0 == "unexpected error")
ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85)
[0x7f5a82e74a55]
2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x77a) [0x7f5a82b02eba]
3: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>&, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a5) [0x7f5a82b056e5]
4: (ObjectStore::queue_transactions(ObjectStore::Sequencer*,
std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>&, Context*, Context*, Context*, Context*, std::shared_ptr<TrackedOp>)+0x2a6)
[0x7f5a82aad0b6]
5: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>,
std::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x6e4) [0x7f5a827debb4]
6: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>,
std::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>,
std::shared_ptr<DeletingState> > >::_void_process(void*,
ThreadPool::TPHandle&)+0x11a) [0x7f5a8283a15a]
7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa7e) [0x7f5a82e65a9e]
8: (ThreadPool::WorkThread::entry()+0x10) [0x7f5a82e66980]
9: (()+0x7dc5) [0x7f5a80dbedc5]
10: (clone()+0x6d) [0x7f5a7f44a28d]
In both cases a replicated pool with 3 copies was created, some content added
and then the number of copies set down to 1. Not a common thing to do I know,
but this works on FileStore OSDs.
This is a cluster deployed using redhat 7 Jewel (10.1) RPMs from
download.ceph.com
Steve
----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any
disclosure, copying, or further distribution of confidential information is not
permitted unless such privilege is explicitly granted in writing by Quantum.
Quantum reserves the right to have electronic communications, including email
and attachments, sent across its networks filtered through anti virus and spam
software programs and retain such messages in order to comply with applicable
data security and retention requirements. Quantum is not responsible for the
proper and complete transmission of the substance of this communication or for
any delay in its receipt.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com