Hi,

I have two OSD's which are failing with an assert which looks related to 
missing objects. This happened after a large RBD snapshot
was deleted causing several OSD's to start flapping as they experienced high 
load. Cluster is fully recovered and I don't need any
help from a recovery perspective. I'm happy to Zap and recreate OSD's, which I 
will probably do in a couple of days time. Or if
anybody looks at the error and see's an easy way to get the OSD to start up, 
then bonus!!!

However, I thought I would post in case there is any interest in trying to 
diagnose why this happened. There was no power or
networking issues and no hard reboot's, so this is purely contained within the 
Ceph OSD process.

The objects that it claims are missing are from the RBD that had the snapshot 
deleted. I'm guessing that the last command before the
OSD died at some point was to delete those two objects which did actually 
happen, but for some reason the OSD had died before it got
confirmation??? And now it's trying to delete them, but they don't exist.

I have the full debug 20 log, but pretty much all the lines above the below 
snippet just have it deleting thousands of objects
without any problems.

Nick 

 -4> 2016-11-15 09:46:52.061643 7f728f9368c0 20 read_log 6 divergent_priors
    -3> 2016-11-15 09:46:52.061779 7f728f9368c0 10 read_log checking for 
missing items over interval (0'0,1607344'260104]
    -2> 2016-11-15 09:46:52.069987 7f728f9368c0 15 read_log  missing
1553246'255377,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:head
    -1> 2016-11-15 09:46:52.070007 7f728f9368c0 15 read_log  missing
1553190'255366,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:6c
     0> 2016-11-15 09:46:52.071471 7f728f9368c0 -1 osd/PGLog.cc: In function 
'static void PGLog::read_log(ObjectStore*, coll_t,
coll_t, ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&, 
PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
const DoutPrefixProvider*, std::set<std::__cxx11::basic_string<char> >*)' 
thread 7f728f9368c0 time 2016-11-15 09:46:52.070023
osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)

 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) 
[0x5642d2734ea0]
 2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t 
const&, std::map<eversion_t, hobject_t,
std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > 
>&, PGLog::IndexedLog&, pg_missing_t&,
std::__cxx11::basic_ostringstream<char, std::char_traits<char>, 
std::allocator<char> >&, DoutPrefixProvider const*,
std::set<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >, 
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > > >*)+0x719) [0x5642d22e2fd9]
 3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2f6) [0x5642d21172d6]
 4: (OSD::load_pgs()+0x87d) [0x5642d205345d]
 5: (OSD::init()+0x2026) [0x5642d205e7a6]
 6: (main()+0x2ea5) [0x5642d1fd08f5]
 7: (__libc_start_main()+0xf0) [0x7f728c77c830]
 8: (_start()+0x29) [0x5642d2011f89]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to