Hello,

I have a particular OSD (53), which at random will crash with the OSD process 
stopping.

OS: Debian 8.x
CEPH : ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

>From the logs at the time of the OSD being marked as crashed I can only see 
>the following:

    -4> 2017-02-10 23:40:16.820894 7fadbd049700  1 -- 172.16.3.7:6825/16969 <== 
osd.26 172.16.2.104:0/5812 1 ==== osd_ping(ping e29842 stamp 2017-02$
    -3> 2017-02-10 23:40:16.820918 7fadbd049700  1 -- 172.16.3.7:6825/16969 --> 
172.16.2.104:0/5812 -- osd_ping(ping_reply e29842 stamp 2017-02-10 2$
    -2> 2017-02-10 23:40:16.822436 7faddb149700  1 -- 172.16.2.107:6820/16969 
<== client.8222771 172.16.2.2:0/1125091221 86 ==== osd_op(client.82227$
    -1> 2017-02-10 23:40:16.822453 7faddb149700  5 -- op tracker -- seq: 670, 
time: 2017-02-10 23:40:16.822453, event: queued_for_pg, op: osd_op(cli$
     0> 2017-02-10 23:40:16.832241 7fadd0631700 -1 *** Caught signal (Aborted) 
**
in thread 7fadd0631700 thread_name:tp_osd_tp

ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
1: (()+0x951cc7) [0x5556d8c4bcc7]
2: (()+0xf890) [0x7fadf5f8e890]
3: (gsignal()+0x37) [0x7fadf3fd5067]
4: (abort()+0x148) [0x7fadf3fd6448]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) 
[0x5556d8d51296]
6: (FileStore::read(coll_t const&, ghobject_t const&, unsigned long, unsigned 
long, ceph::buffer::list&, unsigned int, bool)+0xd7c) [0x5556d89e68ec]
7: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned long, 
unsigned long, unsigned int, ceph::buffer::list*)+0xcd) [0x5556d885ce7d]
8: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, 
std::allocator<OSDOp> >&)+0x6355) [0x5556d87f6515]
9: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x61) 
[0x5556d8802101]
10: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x936) [0x5556d880a566]
11: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x37c3) [0x5556d880f3d3]
12: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, 
ThreadPool::TPHandle&)+0x727) [0x5556d87c6ae7]
13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, 
ThreadPool::TPHandle&)+0x420) [0x5556d866b650]
14: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6a) 
[0x5556d866b8aa]
15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x87a) 
[0x5556d8687f7a]
16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8b6) 
[0x5556d8d40c56]
17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5556d8d42c10]
18: (()+0x8064) [0x7fadf5f87064]
19: (clone()+0x6d) [0x7fadf408862d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.


Does this relate to anything or do I need to dig deeper to find the issue?

,Ashley
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to