We see this error on Hammer 0.94.6.
Bug report updated with logs.
On 11/15/2016 07:30 PM, Samuel Just wrote:
> I just pushed a branch wip-17916-jewel based on v10.2.3 with some
> additional debugging. Once it builds, would you be able to start the
> afflicted osds with that version of ceph-osd and
> debug osd = 20
> debug ms = 1
> debug filestore = 20
> and get me the log?
> On Tue, Nov 15, 2016 at 2:06 AM, Nick Fisk <n...@fisk.me.uk> wrote:
>> I have two OSD's which are failing with an assert which looks related to
>> missing objects. This happened after a large RBD snapshot
>> was deleted causing several OSD's to start flapping as they experienced high
>> load. Cluster is fully recovered and I don't need any
>> help from a recovery perspective. I'm happy to Zap and recreate OSD's, which
>> I will probably do in a couple of days time. Or if
>> anybody looks at the error and see's an easy way to get the OSD to start up,
>> then bonus!!!
>> However, I thought I would post in case there is any interest in trying to
>> diagnose why this happened. There was no power or
>> networking issues and no hard reboot's, so this is purely contained within
>> the Ceph OSD process.
>> The objects that it claims are missing are from the RBD that had the
>> snapshot deleted. I'm guessing that the last command before the
>> OSD died at some point was to delete those two objects which did actually
>> happen, but for some reason the OSD had died before it got
>> confirmation??? And now it's trying to delete them, but they don't exist.
>> I have the full debug 20 log, but pretty much all the lines above the below
>> snippet just have it deleting thousands of objects
>> without any problems.
>> -4> 2016-11-15 09:46:52.061643 7f728f9368c0 20 read_log 6 divergent_priors
>> -3> 2016-11-15 09:46:52.061779 7f728f9368c0 10 read_log checking for
>> missing items over interval (0'0,1607344'260104]
>> -2> 2016-11-15 09:46:52.069987 7f728f9368c0 15 read_log missing
>> -1> 2016-11-15 09:46:52.070007 7f728f9368c0 15 read_log missing
>> 0> 2016-11-15 09:46:52.071471 7f728f9368c0 -1 osd/PGLog.cc: In function
>> 'static void PGLog::read_log(ObjectStore*, coll_t,
>> coll_t, ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&,
>> PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
>> const DoutPrefixProvider*, std::set<std::__cxx11::basic_string<char> >*)'
>> thread 7f728f9368c0 time 2016-11-15 09:46:52.070023
>> osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)
>> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x80) [0x5642d2734ea0]
>> 2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t
>> const&, std::map<eversion_t, hobject_t,
>> std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t>
>> > >&, PGLog::IndexedLog&, pg_missing_t&,
>> std::__cxx11::basic_ostringstream<char, std::char_traits<char>,
>> std::allocator<char> >&, DoutPrefixProvider const*,
>> std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > >,
>> std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > > >*)+0x719) [0x5642d22e2fd9]
>> 3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2f6)
>> 4: (OSD::load_pgs()+0x87d) [0x5642d205345d]
>> 5: (OSD::init()+0x2026) [0x5642d205e7a6]
>> 6: (main()+0x2ea5) [0x5642d1fd08f5]
>> 7: (__libc_start_main()+0xf0) [0x7f728c77c830]
>> 8: (_start()+0x29) [0x5642d2011f89]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>> interpret this.
>> ceph-users mailing list
> ceph-users mailing list
ceph-users mailing list