Hello Cephers!
trying to repair an inconsistent PG results in the osd dying with an
assertion failure:
0> 2015-12-01 07:22:13.398006 7f76d6594700 -1 osd/SnapMapper.cc:
In function 'int SnapMapper::get_snaps(const hobject_t&
, SnapMapper::object_snaps*)' thread 7f76d6594700 time 2015-12-01
07:22:13.394900
osd/SnapMapper.cc: 153: FAILED assert(!out->snaps.empty())
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xbc60eb]
2: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x40c) [0x72aecc]
3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> >*)+0xa2) [0x72
b062]
4: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84]
5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8]
6: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0]
7: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e]
8: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9]
9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
10: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
11: (()+0x8182) [0x7f76fe072182]
12: (clone()+0x6d) [0x7f76fc5dd47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.339.log
--- end dump of recent events ---
2015-12-01 07:22:13.476525 7f76d6594700 -1 *** Caught signal (Aborted) **
in thread 7f76d6594700
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: /usr/bin/ceph-osd() [0xacd7ba]
2: (()+0x10340) [0x7f76fe07a340]
3: (gsignal()+0x39) [0x7f76fc519cc9]
4: (abort()+0x148) [0x7f76fc51d0d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f76fce24535]
6: (()+0x5e6d6) [0x7f76fce226d6]
7: (()+0x5e703) [0x7f76fce22703]
8: (()+0x5e922) [0x7f76fce22922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x278) [0xbc62d8]
10: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x40c) [0x72aecc]
11: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> >*)+0xa2) [0x72b062]
12: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84]
13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8]
14: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0]
15: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e]
16: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9]
17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
18: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
19: (()+0x8182) [0x7f76fe072182]
20: (clone()+0x6d) [0x7f76fc5dd47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- begin dump of recent events ---
-4> 2015-12-01 07:22:13.403280 7f76e4db1700 1 --
10.9.246.104:6887/8548 <== osd.109 10.9.245.204:0/3407 13 ====
osd_ping(ping e320057 stamp 2015-12-01 07:22:13.399779) v2 ==== 47+0+0
(1340520147 0 0) 0x22456800 con 0x22340b00
-3> 2015-12-01 07:22:13.403313 7f76e4db1700 1 --
10.9.246.104:6887/8548 --> 10.9.245.204:0/3407 -- osd_ping(ping_reply
e320057 stamp 2015-12-01 07:22:13.399779) v2 -- ?+0 0x23e3be00 con
0x22340b00
-2> 2015-12-01 07:22:13.403365 7f76e35ae700 1 --
10.9.246.104:6883/8548 <== osd.109 10.9.245.204:0/3407 13 ====
osd_ping(ping e320057 stamp 2015-12-01 07:22:13.399779) v2 ==== 47+0+0
(1340520147 0 0) 0x22457600 con 0x22570d60
-1> 2015-12-01 07:22:13.403405 7f76e35ae700 1 --
10.9.246.104:6883/8548 --> 10.9.245.204:0/3407 -- osd_ping(ping_reply
e320057 stamp 2015-12-01 07:22:13.399779) v2 -- ?+0 0x23e3fe00 con
0x22570d60
0> 2015-12-01 07:22:13.476525 7f76d6594700 -1 *** Caught signal
(Aborted) **
in thread 7f76d6594700
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: /usr/bin/ceph-osd() [0xacd7ba]
2: (()+0x10340) [0x7f76fe07a340]
3: (gsignal()+0x39) [0x7f76fc519cc9]
4: (abort()+0x148) [0x7f76fc51d0d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f76fce24535]
6: (()+0x5e6d6) [0x7f76fce226d6]
7: (()+0x5e703) [0x7f76fce22703]
8: (()+0x5e922) [0x7f76fce22922]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x278) [0xbc62d8]
10: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x40c) [0x72aecc]
11: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> >*)+0xa2) [0x72b062]
12: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84]
13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8]
14: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0]
15: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e]
16: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9]
17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
18: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
19: (()+0x8182) [0x7f76fe072182]
20: (clone()+0x6d) [0x7f76fc5dd47d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.339.log
--- end dump of recent events ---
2015-12-01 07:22:13.889279 7f0be9daf900 0 ceph version 0.94.5
(9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid
12810
2015-12-01 07:22:13.904298 7f0be9daf900 0
filestore(/var/lib/ceph/osd/ceph-339) backend xfs (magic 0x58465342)
As it mentioned snapshots i generously deleted some and re-repaired.
This took 2 other boxes out of operation with kernel_hung_tasks of
ceph-osds waiting for xfs_fs_sync and load ~10000. Thankfully
power-cycling those was enough.
osd-339 is now much more chattybefore dying:
http://www.traced.net/u/toasta/tmp/ceph-osd.339.log.txt
How do I get this pg to cooperate again?
Is it safe to just delete it from the filesystem and let it repair
(from one of the replicas)?
Thx in advance
Benedikt
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com