Hi,
I have two OSD's which are failing with an assert which looks related to
missing objects. This happened after a large RBD snapshot
was deleted causing several OSD's to start flapping as they experienced high
load. Cluster is fully recovered and I don't need any
help from a recovery perspective. I'm happy to Zap and recreate OSD's, which I
will probably do in a couple of days time. Or if
anybody looks at the error and see's an easy way to get the OSD to start up,
then bonus!!!
However, I thought I would post in case there is any interest in trying to
diagnose why this happened. There was no power or
networking issues and no hard reboot's, so this is purely contained within the
Ceph OSD process.
The objects that it claims are missing are from the RBD that had the snapshot
deleted. I'm guessing that the last command before the
OSD died at some point was to delete those two objects which did actually
happen, but for some reason the OSD had died before it got
confirmation??? And now it's trying to delete them, but they don't exist.
I have the full debug 20 log, but pretty much all the lines above the below
snippet just have it deleting thousands of objects
without any problems.
Nick
-4> 2016-11-15 09:46:52.061643 7f728f9368c0 20 read_log 6 divergent_priors
-3> 2016-11-15 09:46:52.061779 7f728f9368c0 10 read_log checking for
missing items over interval (0'0,1607344'260104]
-2> 2016-11-15 09:46:52.069987 7f728f9368c0 15 read_log missing
1553246'255377,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:head
-1> 2016-11-15 09:46:52.070007 7f728f9368c0 15 read_log missing
1553190'255366,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:6c
0> 2016-11-15 09:46:52.071471 7f728f9368c0 -1 osd/PGLog.cc: In function
'static void PGLog::read_log(ObjectStore*, coll_t,
coll_t, ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&,
PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
const DoutPrefixProvider*, std::set<std::__cxx11::basic_string<char> >*)'
thread 7f728f9368c0 time 2016-11-15 09:46:52.070023
osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80)
[0x5642d2734ea0]
2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t
const&, std::map<eversion_t, hobject_t,
std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> >
>&, PGLog::IndexedLog&, pg_missing_t&,
std::__cxx11::basic_ostringstream<char, std::char_traits<char>,
std::allocator<char> >&, DoutPrefixProvider const*,
std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > > >*)+0x719) [0x5642d22e2fd9]
3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2f6) [0x5642d21172d6]
4: (OSD::load_pgs()+0x87d) [0x5642d205345d]
5: (OSD::init()+0x2026) [0x5642d205e7a6]
6: (main()+0x2ea5) [0x5642d1fd08f5]
7: (__libc_start_main()+0xf0) [0x7f728c77c830]
8: (_start()+0x29) [0x5642d2011f89]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com