Attaching with logging to level 20.
After repeat attempts by removing nobackfill I have got it down to:
recovery 31892/272325586 objects degraded (0.012%)
recovery 2/272325586 objects misplaced (0.000%)
However any further attempts after removing nobackfill just causes an instant
crash on 83 & 84, at this point I feel there is some corruption on the
remaining 11 OSD's of the PG however the error's aren't directly saying that,
however always end the crash with:
-1 *** Caught signal (Aborted) ** in thread 7f716e862700
thread_name:tp_osd_recov
,Ashley
From: ceph-users [mailto:[email protected]] On Behalf Of Ashley
Merrick
Sent: 03 June 2017 17:14
To: [email protected]
Subject: Re: [ceph-users] PG Stuck EC Pool
This sender failed our fraud detection checks and may not be who they appear to
be. Learn about spoofing<http://aka.ms/LearnAboutSpoofing>
Feedback<http://aka.ms/SafetyTipsFeedback>
I have now done some further testing and seeing these errors on 84 / 83 the
OSD's that crash while backfilling to 10,11
-60> 2017-06-03 10:08:56.651768 7f6f76714700 1 -- 172.16.3.14:6823/2694 <==
osd.3 172.16.2.101:0/25361 10 ==== osd_ping(ping e71688 stamp 2017-06-03
10:08:56.652035) v2 ==== 47+0+0 (1097709006 0 0) 0x5569ea88d400 con
0x5569e900e300
-59> 2017-06-03 10:08:56.651804 7f6f76714700 1 -- 172.16.3.14:6823/2694 -->
172.16.2.101:0/25361 -- osd_ping(ping_reply e71688 stamp 2017-06-03
10:08:56.652035) v2 -- ?+0 0x5569e985fc00 con 0x5569e900e300
-6> 2017-06-03 10:08:56.937156 7f6f5ee4d700 1 -- 172.16.3.14:6822/2694 <==
osd.53 172.16.3.7:6816/15230 13 ==== MOSDECSubOpReadReply(6.14s3 71688
ECSubReadReply(tid=83, attrs_read=0)) v1 ==== 148+0+0 (2355392791 0 0)
0x5569e8b22080 con 0x5569e9538f00
-5> 2017-06-03 10:08:56.937193 7f6f5ee4d700 5 -- op tracker -- seq: 2409,
time: 2017-06-03 10:08:56.937193, event: queued_for_pg, op:
MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83, attrs_read=0))
-4> 2017-06-03 10:08:56.937241 7f6f8ef8a700 5 -- op tracker -- seq: 2409,
time: 2017-06-03 10:08:56.937240, event: reached_pg, op:
MOSDECSubOpReadReply(6.14s3 71688 ECSubReadReply(tid=83, attrs_read=0))
-3> 2017-06-03 10:08:56.937266 7f6f8ef8a700 0 osd.83 pg_epoch: 71688
pg[6.14s3( v 71685'35512 (68694'30812,71685'35512] local-les=71688 n=15928
ec=31534 les/c/f 71688/69510/67943 71687/71687/71687)
[11,10,2147483647,83,22,26,69,72,53,59,8,4,46]/[2147483647,2147483647,2147483647,83,22,26,69,72,53,59,8,4,46]
r=3 lpr=71687 pi=47065-71686/711 rops=1 bft=10(1),11(0) crt=71629'35509 mlcod
0'0 active+undersized+degraded+remapped+inconsistent+backfilling NIBBLEWISE]
failed_push 6:28170432:::rbd_data.e3d8852ae8944a.0000000000047d28:head from
shard 53(8), reps on unfound? 0
-2> 2017-06-03 10:08:56.937346 7f6f8ef8a700 5 -- op tracker -- seq: 2409,
time: 2017-06-03 10:08:56.937345, event: done, op: MOSDECSubOpReadReply(6.14s3
71688 ECSubReadReply(tid=83, attrs_read=0))
-1> 2017-06-03 10:08:56.937351 7f6f89f80700 -1 osd.83 pg_epoch: 71688
pg[6.14s3( v 71685'35512 (68694'30812,71685'35512] local-les=71688 n=15928
ec=31534 les/c/f 71688/69510/67943 71687/71687/71687)
[11,10,2147483647,83,22,26,69,72,53,59,8,4,46]/[2147483647,2147483647,2147483647,83,22,26,69,72,53,59,8,4,46]
r=3 lpr=71687 pi=47065-71686/711 bft=10(1),11(0) crt=71629'35509 mlcod 0'0
active+undersized+degraded+remapped+inconsistent+backfilling NIBBLEWISE]
recover_replicas: object added to missing set for backfill, but is not in
recovering, error!
-42> 2017-06-03 10:08:56.968433 7f6f5f04f700 1 -- 172.16.2.114:6822/2694
<== client.22857445 172.16.2.212:0/2238053329 56 ====
osd_op(client.22857445.1:759236283 2.e732321d
rbd_data.61b4c6238e1f29.000000000001ea27 [set-alloc-hint object_size 4194304
write_size 4194304,write 126976~45056] snapc 0=[] ondisk+write e71688) v4 ====
217+0+45056 (2626314663 0 3883338397) 0x5569ea886b00 con 0x5569ea99c880
From: Ashley Merrick
Sent: 03 June 2017 14:27
To: '[email protected]' <[email protected]<mailto:[email protected]>>
Subject: RE: PG Stuck EC Pool
>From this extract from pg query:
"up": [
11,
10,
84,
83,
22,
26,
69,
72,
53,
59,
8,
4,
46
],
"acting": [
2147483647,
2147483647,
84,
83,
22,
26,
69,
72,
53,
59,
8,
4,
46
I am wondering if there is an issue on 11 , 10 causing the current active
primary "acting_primar": 84" to crash.
But can't see anything that could be causing it.
,Ashley
From: Ashley Merrick
Sent: 01 June 2017 23:39
To: [email protected]<mailto:[email protected]>
Subject: RE: PG Stuck EC Pool
Have attached the full pg query for the effected PG encase this shows anything
of interest.
Thanks
From: ceph-users [mailto:[email protected]] On Behalf Of Ashley
Merrick
Sent: 01 June 2017 17:19
To: [email protected]<mailto:[email protected]>
Subject: [ceph-users] PG Stuck EC Pool
This sender failed our fraud detection checks and may not be who they appear to
be. Learn about spoofing<http://aka.ms/LearnAboutSpoofing>
Feedback<http://aka.ms/SafetyTipsFeedback>
Have a PG which is stuck in this state (Is an EC with K=10 M=3)
pg 6.14 is active+undersized+degraded+remapped+inconsistent+backfilling, acting
[2147483647,2147483647,84,83,22,26,69,72,53,59,8,4,46]
Currently have no-recover set, if I unset no recover both OSD 83 + 84 start to
flap and go up and down, I see the following in the log's of the OSD.
*****
-5> 2017-06-01 10:08:29.658593 7f430ec97700 1 -- 172.16.3.14:6806/5204 <==
osd.17 172.16.3.3:6806/2006016 57 ==== MOSDECSubOpWriteReply(6.31as0 71513
ECSubWriteReply(tid=152, last_complete=0'0, committed=0, applied=1)) v1 ====
67+0+0 (245959818 0 0) 0x563c9db7be00 con 0x563c9cfca480
-4> 2017-06-01 10:08:29.658620 7f430ec97700 5 -- op tracker -- seq: 2367,
time: 2017-06-01 10:08:29.658620, event: queued_for_pg, op:
MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0,
committed=0, applied=1))
-3> 2017-06-01 10:08:29.658649 7f4319e11700 5 -- op tracker -- seq: 2367,
time: 2017-06-01 10:08:29.658649, event: reached_pg, op:
MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0,
committed=0, applied=1))
-2> 2017-06-01 10:08:29.658661 7f4319e11700 5 -- op tracker -- seq: 2367,
time: 2017-06-01 10:08:29.658660, event: done, op:
MOSDECSubOpWriteReply(6.31as0 71513 ECSubWriteReply(tid=152, last_complete=0'0,
committed=0, applied=1))
-1> 2017-06-01 10:08:29.663107 7f43320ec700 5 -- op tracker -- seq: 2317,
time: 2017-06-01 10:08:29.663107, event: sub_op_applied, op:
osd_op(osd.79.66617:8675008 6.82058b1a rbd_data.e5208a238e1f29.0000000000025f3e
[copy-from ver 4678410] snapc 0=[]
ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e71513)
0> 2017-06-01 10:08:29.663474 7f4319610700 -1 *** Caught signal (Aborted)
**
in thread 7f4319610700 thread_name:tp_osd_recov
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
1: (()+0x9564a7) [0x563c6a6f24a7]
2: (()+0xf890) [0x7f4342308890]
3: (gsignal()+0x37) [0x7f434034f067]
4: (abort()+0x148) [0x7f4340350448]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x256) [0x563c6a7f83d6]
6: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0x62f)
[0x563c6a2850ff]
7: (ReplicatedPG::start_recovery_ops(int, ThreadPool::TPHandle&, int*)+0xa8a)
[0x563c6a2b878a]
8: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x36d) [0x563c6a131bbd]
9: (ThreadPool::WorkQueue<PG>::_void_process(void*,
ThreadPool::TPHandle&)+0x1d) [0x563c6a17c88d]
10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa9f) [0x563c6a7e8e3f]
11: (ThreadPool::WorkThread::entry()+0x10) [0x563c6a7e9d70]
12: (()+0x8064) [0x7f4342301064]
13: (clone()+0x6d) [0x7f434040262d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
*****
What should my next steps be?
Thanks!
-86> 2017-06-03 10:42:27.680419 7f7172069700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.680418, event: started, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-85> 2017-06-03 10:42:27.680423 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] execute_ctx 0x56146b613c00
-84> 2017-06-03 10:42:27.680441 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] do_op 2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head
[set-alloc-hint object_size 4194304 write_size 4194304,write 3616768~65536] ov
71739'1378125 av 71740'1378127 snapc 0=[] snapset 0=[]:[]+head
-83> 2017-06-03 10:42:27.680450 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] do_osd_op
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head [set-alloc-hint
object_size 4194304 write_size 4194304,write 3616768~65536]
-82> 2017-06-03 10:42:27.680457 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] do_osd_op set-alloc-hint object_size 4194304 write_size 4194304
-81> 2017-06-03 10:42:27.680466 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] do_osd_op write 3616768~65536
-80> 2017-06-03 10:42:27.680474 7f7172069700 20 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] make_writeable
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head
snapset=0x56146baf81f8 snapc=0=[]
-79> 2017-06-03 10:42:27.680481 7f7172069700 20 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] make_writeable
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head done,
snapset=0=[]:[]+head
-78> 2017-06-03 10:42:27.680488 7f7172069700 20 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] finish_ctx
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head 0x56146b613c00 op
modify
-77> 2017-06-03 10:42:27.680500 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] set mtime to 2017-06-03 10:42:27.670488
-76> 2017-06-03 10:42:27.680511 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] final snapset 0=[]:[]+head in
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head
-75> 2017-06-03 10:42:27.680524 7f7172069700 20 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] zeroing write result code 0
-74> 2017-06-03 10:42:27.680531 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] new_repop rep_tid 1391 on osd_op(client.22857445.1:759980367
2.1884fbf0 rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size
4194304 write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
v4
-73> 2017-06-03 10:42:27.680538 7f7172069700 7 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] issue_repop rep_tid 1391 o
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head
-72> 2017-06-03 10:42:27.680580 7f7172069700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.680580, event: waiting for subops from 11, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-71> 2017-06-03 10:42:27.680597 7f7172069700 20 osd.84 71740 share_map_peer
0x561465ddda80 already has epoch 71740
-70> 2017-06-03 10:42:27.680599 7f7172069700 1 -- 172.16.3.14:6806/11590
--> 172.16.3.3:6818/14977 -- osd_repop(client.22857445.1:759980367 2.3f0
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head v 71740'1378127) v1
-- ?+66267 0x56146bba9600 con 0x561465ddda80
-69> 2017-06-03 10:42:27.680613 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71739'1378126 (71527'1375122,71739'1378126] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378124 lcod 71739'1378125 mlcod 71739'1378125 active+clean
NIBBLEWISE] append_log log((71527'1375122,71739'1378126], crt=71739'1378124)
[71740'1378127 (71739'1378125) modify
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head by
client.22857445.1:759980367 2017-06-03 10:42:27.670488]
-68> 2017-06-03 10:42:27.680626 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 lua=71739'1378126 crt=71739'1378124 lcod 71739'1378125 mlcod
71739'1378125 active+clean NIBBLEWISE] add_log_entry 71740'1378127
(71739'1378125) modify
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head by
client.22857445.1:759980367 2017-06-03 10:42:27.670488
-67> 2017-06-03 10:42:27.680636 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod
71739'1378125 active+clean NIBBLEWISE] append_log: trimming to 71739'1378125
entries 71739'1378125 (71722'1378109) modify
2:0fdf2118:::rbd_data.61b4c6238e1f29.0000000000001c8b:head by
client.22857445.1:759960105 2017-06-03 10:41:29.573977
-66> 2017-06-03 10:42:27.680653 7f7172069700 5 write_log with: dirty_to:
0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors:
false, divergent_priors: 0, writeout_from: 71740'1378127, trimmed:
-65> 2017-06-03 10:42:27.680711 7f7172069700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.680711, event: commit_queued_for_journal_write, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-64> 2017-06-03 10:42:27.680722 7f7172069700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod
71739'1378125 active+clean NIBBLEWISE] eval_repop repgather(0x56144dec5c20
71740'1378127 rep_tid=1391 committed?=0 applied?=0) wants=d
-63> 2017-06-03 10:42:27.680731 7f7172069700 10 osd.84 71740 dequeue_op
0x56146d3bf700 finish
-62> 2017-06-03 10:42:27.680732 7f71891ef700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.680731, event: write_thread_in_journal_buffer, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-61> 2017-06-03 10:42:27.680968 7f71889ee700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.680968, event: journaled_completion_queued, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-60> 2017-06-03 10:42:27.680983 7f71869ea700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod
71739'1378125 active+clean NIBBLEWISE] op_commit: 1391
-59> 2017-06-03 10:42:27.680998 7f71869ea700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.680997, event: op_commit, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-58> 2017-06-03 10:42:27.681335 7f71861e9700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod
71739'1378125 active+clean NIBBLEWISE] op_applied: 1391
-57> 2017-06-03 10:42:27.681353 7f71861e9700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.681353, event: op_applied, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-56> 2017-06-03 10:42:27.681364 7f71861e9700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 lua=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod
71739'1378125 active+clean NIBBLEWISE] op_applied version 71740'1378127
-55> 2017-06-03 10:42:27.683398 7f7160fd0700 1 -- 172.16.3.14:6806/11590
<== osd.11 172.16.3.3:6818/14977 147 ====
osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 ==== 83+0+0 (3329426065 0
0) 0x56146914d180 con 0x561465ddda80
-54> 2017-06-03 10:42:27.683413 7f7160fd0700 10 osd.84 71740
handle_replica_op osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 epoch
71740
-53> 2017-06-03 10:42:27.683417 7f7160fd0700 20 osd.84 71740
should_share_map osd.11 172.16.3.3:6818/14977 71740
-52> 2017-06-03 10:42:27.683421 7f7160fd0700 15 osd.84 71740 enqueue_op
0x56146738aa00 prio 196 cost 0 latency 0.000046
osd_repop_reply(client.22857445.1:759980367 2.3f0) v1
-51> 2017-06-03 10:42:27.683424 7f7160fd0700 5 -- op tracker -- seq: 14016,
time: 2017-06-03 10:42:27.683424, event: queued_for_pg, op:
osd_repop_reply(client.22857445.1:759980367 2.3f0)
-50> 2017-06-03 10:42:27.683443 7f716f864700 10 osd.84 71740 dequeue_op
0x56146738aa00 prio 196 cost 0 latency 0.000067
osd_repop_reply(client.22857445.1:759980367 2.3f0) v1 pg pg[2.3f0( v
71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271 ec=130
les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125
active+clean NIBBLEWISE]
-49> 2017-06-03 10:42:27.683457 7f716f864700 5 -- op tracker -- seq: 14016,
time: 2017-06-03 10:42:27.683457, event: reached_pg, op:
osd_repop_reply(client.22857445.1:759980367 2.3f0)
-48> 2017-06-03 10:42:27.683462 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125
active+clean NIBBLEWISE] handle_message: 0x56146738aa00
-47> 2017-06-03 10:42:27.683471 7f716f864700 5 -- op tracker -- seq: 14016,
time: 2017-06-03 10:42:27.683470, event: started, op:
osd_repop_reply(client.22857445.1:759980367 2.3f0 ondisk, result = 0)
-46> 2017-06-03 10:42:27.683473 7f716f864700 7 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125
active+clean NIBBLEWISE] sub_op_modify_reply: tid 1391 op ack_type 4 from 11
-45> 2017-06-03 10:42:27.683481 7f716f864700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.683481, event: sub_op_commit_rec from 11, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-44> 2017-06-03 10:42:27.683490 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125
active+clean NIBBLEWISE] repop_all_applied: repop tid 1391 all applied
-43> 2017-06-03 10:42:27.683496 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125
active+clean NIBBLEWISE] eval_repop repgather(0x56144dec5c20 71740'1378127
rep_tid=1391 committed?=0 applied?=1) wants=d
-42> 2017-06-03 10:42:27.683502 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125
active+clean NIBBLEWISE] applied: repgather(0x56144dec5c20 71740'1378127
rep_tid=1391 committed?=0 applied?=1)
-41> 2017-06-03 10:42:27.683508 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
luod=71739'1378126 crt=71739'1378125 lcod 71739'1378125 mlcod 71739'1378125
active+clean NIBBLEWISE] repop_all_committed: repop tid 1391 all committed
-40> 2017-06-03 10:42:27.683513 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean
NIBBLEWISE] eval_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391
committed?=1 applied?=1) wants=d
-39> 2017-06-03 10:42:27.683519 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean
NIBBLEWISE] commit: repgather(0x56144dec5c20 71740'1378127 rep_tid=1391
committed?=1 applied?=1)
-38> 2017-06-03 10:42:27.683525 7f716f864700 15 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean
NIBBLEWISE] log_op_stats osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) v4 inb
65536 outb 0 rlat 0.003504 lat 0.003521
-37> 2017-06-03 10:42:27.683533 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean
NIBBLEWISE] sending commit on osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740) v4
0x561469257e40
-36> 2017-06-03 10:42:27.683540 7f716f864700 1 -- 172.16.2.114:6806/11590
--> 172.16.2.212:0/2238053329 -- osd_op_reply(759980367
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] v71740'1378127 uv6406885 ondisk = 0) v7
-- ?+0 0x561469257e40 con 0x5614678fb480
-35> 2017-06-03 10:42:27.683553 7f716f864700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.683553, event: commit_sent, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-34> 2017-06-03 10:42:27.683561 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean
NIBBLEWISE] applied: repgather(0x56144dec5c20 71740'1378127 rep_tid=1391
committed?=1 applied?=1)
-33> 2017-06-03 10:42:27.683573 7f716f864700 15 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378125 active+clean
NIBBLEWISE] publish_stats_to_osd 71740:1137643
-32> 2017-06-03 10:42:27.683582 7f716f864700 15 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean
NIBBLEWISE] do_osd_op_effects client.22857445 con 0x5614678fb480
-31> 2017-06-03 10:42:27.683592 7f716f864700 10 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean
NIBBLEWISE] removing repgather(0x56144dec5c20 71740'1378127 rep_tid=1391
committed?=1 applied?=1)
-30> 2017-06-03 10:42:27.683601 7f716f864700 20 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean
NIBBLEWISE] q front is repgather(0x56144dec5c20 71740'1378127 rep_tid=1391
committed?=1 applied?=1)
-29> 2017-06-03 10:42:27.683610 7f716f864700 20 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean
NIBBLEWISE] remove_repop repgather(0x56144dec5c20 71740'1378127 rep_tid=1391
committed?=1 applied?=1)
-28> 2017-06-03 10:42:27.683623 7f716f864700 15 osd.84 pg_epoch: 71740
pg[2.3f0( v 71740'1378127 (71527'1375122,71740'1378127] local-les=71729 n=6271
ec=130 les/c/f 71729/71729/67943 71727/71728/71728) [84,11] r=0 lpr=71728
crt=71739'1378125 lcod 71739'1378126 mlcod 71739'1378126 active+clean
NIBBLEWISE] requeue_ops
-27> 2017-06-03 10:42:27.683634 7f716f864700 5 -- op tracker -- seq: 14015,
time: 2017-06-03 10:42:27.683633, event: done, op:
osd_op(client.22857445.1:759980367 2.1884fbf0
rbd_data.61b4c6238e1f29.0000000000001c8b [set-alloc-hint object_size 4194304
write_size 4194304,write 3616768~65536] snapc 0=[] ondisk+write e71740)
-26> 2017-06-03 10:42:27.683654 7f716f864700 10 osd.84 71740 dequeue_op
0x56146738aa00 finish
-25> 2017-06-03 10:42:27.683658 7f716f864700 5 -- op tracker -- seq: 14016,
time: 2017-06-03 10:42:27.683657, event: done, op:
osd_repop_reply(client.22857445.1:759980367 2.3f0 ondisk, result = 0)
-24> 2017-06-03 10:42:27.690237 7f7140a75700 1 -- 172.16.2.114:6807/11590
<== osd.35 172.16.2.105:0/4630 74 ==== osd_ping(ping e71740 stamp 2017-06-03
10:42:27.691894) v2 ==== 47+0+0 (1336756362 0 0) 0x56146804e800 con
0x561467283900
-23> 2017-06-03 10:42:27.690252 7f7140a75700 1 -- 172.16.2.114:6807/11590
--> 172.16.2.105:0/4630 -- osd_ping(ping_reply e71740 stamp 2017-06-03
10:42:27.691894) v2 -- ?+0 0x561468151e00 con 0x561467283900
-22> 2017-06-03 10:42:27.690265 7f7140a75700 20 osd.84 71740 share_map_peer
0x561442ea1000 already has epoch 71740
-21> 2017-06-03 10:42:27.690258 7f7140b76700 1 -- 172.16.3.14:6807/11590
<== osd.35 172.16.2.105:0/4630 74 ==== osd_ping(ping e71740 stamp 2017-06-03
10:42:27.691894) v2 ==== 47+0+0 (1336756362 0 0) 0x56146804ee00 con
0x5614678f8300
-20> 2017-06-03 10:42:27.690275 7f7140b76700 1 -- 172.16.3.14:6807/11590
--> 172.16.2.105:0/4630 -- osd_ping(ping_reply e71740 stamp 2017-06-03
10:42:27.691894) v2 -- ?+0 0x56146d350000 con 0x5614678f8300
-19> 2017-06-03 10:42:27.690287 7f7140b76700 20 osd.84 71740 share_map_peer
0x561442ea1000 already has epoch 71740
-18> 2017-06-03 10:42:27.713217 7f713ee59700 1 -- 172.16.3.14:6807/11590
<== osd.38 172.16.2.105:0/4910 78 ==== osd_ping(ping e71740 stamp 2017-06-03
10:42:27.714244) v2 ==== 47+0+0 (140188268 0 0) 0x561468170600 con
0x5614678f8d80
-17> 2017-06-03 10:42:27.713238 7f713ee59700 1 -- 172.16.3.14:6807/11590
--> 172.16.2.105:0/4910 -- osd_ping(ping_reply e71740 stamp 2017-06-03
10:42:27.714244) v2 -- ?+0 0x561469cb1c00 con 0x5614678f8d80
-16> 2017-06-03 10:42:27.713254 7f713ee59700 20 osd.84 71740 share_map_peer
0x56146631cb80 already has epoch 71740
-15> 2017-06-03 10:42:27.713463 7f713ef5a700 1 -- 172.16.2.114:6807/11590
<== osd.38 172.16.2.105:0/4910 78 ==== osd_ping(ping e71740 stamp 2017-06-03
10:42:27.714244) v2 ==== 47+0+0 (140188268 0 0) 0x561468170800 con
0x5614678f9500
-14> 2017-06-03 10:42:27.713486 7f713ef5a700 1 -- 172.16.2.114:6807/11590
--> 172.16.2.105:0/4910 -- osd_ping(ping_reply e71740 stamp 2017-06-03
10:42:27.714244) v2 -- ?+0 0x561469cb1200 con 0x5614678f9500
-13> 2017-06-03 10:42:27.713498 7f713ef5a700 20 osd.84 71740 share_map_peer
0x56146631cb80 already has epoch 71740
-12> 2017-06-03 10:42:27.722765 7f7148bf6700 1 -- 172.16.3.14:6807/11590
<== osd.15 172.16.3.3:0/5388 79 ==== osd_ping(ping e71740 stamp 2017-06-03
10:42:27.713200) v2 ==== 47+0+0 (1176829029 0 0) 0x5614671ed000 con
0x56146824cd80
-11> 2017-06-03 10:42:27.722794 7f7148bf6700 1 -- 172.16.3.14:6807/11590
--> 172.16.3.3:0/5388 -- osd_ping(ping_reply e71740 stamp 2017-06-03
10:42:27.713200) v2 -- ?+0 0x56146cc2bc00 con 0x56146824cd80
-10> 2017-06-03 10:42:27.722808 7f7148bf6700 20 osd.84 71740 share_map_peer
0x561466029a80 already has epoch 71740
-9> 2017-06-03 10:42:27.722960 7f7148af5700 1 -- 172.16.2.114:6807/11590
<== osd.15 172.16.3.3:0/5388 79 ==== osd_ping(ping e71740 stamp 2017-06-03
10:42:27.713200) v2 ==== 47+0+0 (1176829029 0 0) 0x5614671ed200 con
0x56146824d980
-8> 2017-06-03 10:42:27.722977 7f7148af5700 1 -- 172.16.2.114:6807/11590
--> 172.16.3.3:0/5388 -- osd_ping(ping_reply e71740 stamp 2017-06-03
10:42:27.713200) v2 -- ?+0 0x56146645ae00 con 0x56146824d980
-7> 2017-06-03 10:42:27.722991 7f7148af5700 20 osd.84 71740 share_map_peer
0x561466029a80 already has epoch 71740
-6> 2017-06-03 10:42:27.727100 7f713fa65700 1 -- 172.16.2.114:6807/11590
<== osd.57 172.16.2.107:0/3819 74 ==== osd_ping(ping e71740 stamp 2017-06-03
10:42:27.727036) v2 ==== 47+0+0 (2829453915 0 0) 0x56146751c600 con
0x5614678f9080
-5> 2017-06-03 10:42:27.727114 7f713fa65700 1 -- 172.16.2.114:6807/11590
--> 172.16.2.107:0/3819 -- osd_ping(ping_reply e71740 stamp 2017-06-03
10:42:27.727036) v2 -- ?+0 0x56146986f600 con 0x5614678f9080
-4> 2017-06-03 10:42:27.727126 7f713fa65700 20 osd.84 71740 share_map_peer
0x561466746300 already has epoch 71740
-3> 2017-06-03 10:42:27.727895 7f713fb66700 1 -- 172.16.3.14:6807/11590
<== osd.57 172.16.2.107:0/3819 74 ==== osd_ping(ping e71740 stamp 2017-06-03
10:42:27.727036) v2 ==== 47+0+0 (2829453915 0 0) 0x56146751c000 con
0x5614678f8900
-2> 2017-06-03 10:42:27.727914 7f713fb66700 1 -- 172.16.3.14:6807/11590
--> 172.16.2.107:0/3819 -- osd_ping(ping_reply e71740 stamp 2017-06-03
10:42:27.727036) v2 -- ?+0 0x56146c891000 con 0x5614678f8900
-1> 2017-06-03 10:42:27.727927 7f713fb66700 20 osd.84 71740 share_map_peer
0x561466746300 already has epoch 71740
0> 2017-06-03 10:42:27.743760 7f716e862700 -1 *** Caught signal (Aborted)
**
in thread 7f716e862700 thread_name:tp_osd_recov
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
1: (()+0x9564a7) [0x5614333db4a7]
2: (()+0xf890) [0x7f719633e890]
3: (gsignal()+0x37) [0x7f7194385067]
4: (abort()+0x148) [0x7f7194386448]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x256) [0x5614334e13d6]
6: (ReplicatedPG::recover_replicas(int, ThreadPool::TPHandle&)+0x62f)
[0x561432f6e0ff]
7: (ReplicatedPG::start_recovery_ops(int, ThreadPool::TPHandle&, int*)+0xa8a)
[0x561432fa178a]
8: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x36d) [0x561432e1abbd]
9: (ThreadPool::WorkQueue<PG>::_void_process(void*,
ThreadPool::TPHandle&)+0x1d) [0x561432e6588d]
10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa9f) [0x5614334d1e3f]
11: (ThreadPool::WorkThread::entry()+0x10) [0x5614334d2d70]
12: (()+0x8064) [0x7f7196337064]
13: (clone()+0x6d) [0x7f719443862d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com