Hi Orit,

We're running on jewel, version 10.2.7.

I've ran the bi-list with the debugging commands and this is the end of it:




















































*2017-07-05 08:50:19.705673 7ff3bfefe700  1 -- 10.21.4.1:0/3313807338
<http://10.21.4.1:0/3313807338> <== osd.3 10.21.4.111:6810/3633200
<http://10.21.4.111:6810/3633200> 2297 ==== osd_op_reply(2571
.dir.be-east.5582981.76.0 [call] v0'0 uv65572318 ondisk = 0) v7 ====
145+0+385625 (3432176685 0 1775102993) 0x7ff3b00041a0 con
0x7ff4272f48f02017-07-05 08:50:19.724193 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.111:6810/3633200 <http://10.21.4.111:6810/3633200> --
osd_op(client.5971646.0:2572 48.1b47291b .dir.be-east.5582981.76.0 [call
rgw.bi_list] snapc 0=[] ack+read+known_if_redirected e31545) v7 -- ?+0
0x7ff427327400 con 0x7ff4272f48f02017-07-05 08:50:19.767758 7ff3bfefe700  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> <== osd.3
10.21.4.111:6810/3633200 <http://10.21.4.111:6810/3633200> 2298 ====
osd_op_reply(2572 .dir.be-east.5582981.76.0 [call] v0'0 uv65572318 ondisk =
0) v7 ==== 145+0+385625 (3432176685 0 2330398289) 0x7ff3b00041a0 con
0x7ff4272f48f02017-07-05 08:50:19.786309 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.111:6810/3633200 <http://10.21.4.111:6810/3633200> --
osd_op(client.5971646.0:2573 48.1b47291b .dir.be-east.5582981.76.0 [call
rgw.bi_list] snapc 0=[] ack+read+known_if_redirected e31545) v7 -- ?+0
0x7ff427327400 con 0x7ff4272f48f02017-07-05 08:50:19.827960 7ff3bfefe700  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> <== osd.3
10.21.4.111:6810/3633200 <http://10.21.4.111:6810/3633200> 2299 ====
osd_op_reply(2573 .dir.be-east.5582981.76.0 [call] v0'0 uv65572318 ondisk =
0) v7 ==== 145+0+385625 (3432176685 0 1724305540) 0x7ff3b00041a0 con
0x7ff4272f48f02017-07-05 08:50:19.846588 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.111:6810/3633200 <http://10.21.4.111:6810/3633200> --
osd_op(client.5971646.0:2574 48.1b47291b .dir.be-east.5582981.76.0 [call
rgw.bi_list] snapc 0=[] ack+read+known_if_redirected e31545) v7 -- ?+0
0x7ff427327400 con 0x7ff4272f48f02017-07-05 08:50:19.870830 7ff3bfefe700  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> <== osd.3
10.21.4.111:6810/3633200 <http://10.21.4.111:6810/3633200> 2300 ====
osd_op_reply(2574 .dir.be-east.5582981.76.0 [call] v0'0 uv0 ondisk = -4
((4) Interrupted system call)) v7 ==== 145+0+0 (798610401 0 0)
0x7ff3b00041a0 con 0x7ff4272f48f0ERROR: bi_list(): (4) Interrupted system
call2017-07-05 08:50:19.872489 7ff4250219c0  1 -- 10.21.4.1:0/3313807338
<http://10.21.4.1:0/3313807338> --> 10.21.4.112:6822/2795125
<http://10.21.4.112:6822/2795125> -- osd_op(client.5971646.0:2575
24.4322fa9f notify.0 [watch unwatch cookie 140686606221264] snapc 0=[]
ondisk+write+known_if_redirected e31545) v7 -- ?+0 0x7ff4272d5950 con
0x7ff427302b102017-07-05 08:50:19.878128 7ff3bf0f7700  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> <== osd.23
10.21.4.112:6822/2795125 <http://10.21.4.112:6822/2795125> 63 ====
osd_op_reply(2575 notify.0 [watch unwatch cookie 140686606221264]
v31545'6808 uv6416 ondisk = 0) v7 ==== 128+0+0 (3462997515 0 0)
0x7ff3980014f0 con 0x7ff427302b102017-07-05 08:50:19.878221 7ff4250219c0 20
remove_watcher() i=02017-07-05 08:50:19.878229 7ff4250219c0  2 removed
watcher, disabling cache2017-07-05 08:50:19.878278 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.113:6807/2176843 <http://10.21.4.113:6807/2176843> --
osd_op(client.5971646.0:2576 24.16dafda0 notify.1 [watch unwatch cookie
140686606235888] snapc 0=[] ondisk+write+known_if_redirected e31545) v7 --
?+0 0x7ff4272d5950 con 0x7ff427304ae02017-07-05 08:50:19.880843
7ff3beef5700  1 -- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338>
<== osd.27 10.21.4.113:6807/2176843 <http://10.21.4.113:6807/2176843> 63
==== osd_op_reply(2576 notify.1 [watch unwatch cookie 140686606235888]
v31545'6706 uv6304 ondisk = 0) v7 ==== 128+0+0 (4086455760 0 0)
0x7ff3900014f0 con 0x7ff427304ae02017-07-05 08:50:19.880910 7ff4250219c0 20
remove_watcher() i=12017-07-05 08:50:19.880940 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.111:6802/3632911 <http://10.21.4.111:6802/3632911> --
osd_op(client.5971646.0:2577 24.88aa5c95 notify.2 [watch unwatch cookie
140686606250416] snapc 0=[] ondisk+write+known_if_redirected e31545) v7 --
?+0 0x7ff4272d5950 con 0x7ff4273083d02017-07-05 08:50:19.886387
7ff3becf3700  1 -- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338>
<== osd.1 10.21.4.111:6802/3632911 <http://10.21.4.111:6802/3632911> 94
==== osd_op_reply(2577 notify.2 [watch unwatch cookie 140686606250416]
v31545'10057 uv9497 ondisk = 0) v7 ==== 128+0+0 (2583541993 0 0)
0x7ff388001630 con 0x7ff4273083d02017-07-05 08:50:19.886476 7ff4250219c0 20
remove_watcher() i=22017-07-05 08:50:19.886513 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.111:6814/3633408 <http://10.21.4.111:6814/3633408> --
osd_op(client.5971646.0:2578 24.f8c99aee notify.3 [watch unwatch cookie
140686606264944] snapc 0=[] ondisk+write+known_if_redirected e31545) v7 --
?+0 0x7ff4272d5950 con 0x7ff42730bca02017-07-05 08:50:19.888815
7ff3beaf1700  1 -- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338>
<== osd.5 10.21.4.111:6814/3633408 <http://10.21.4.111:6814/3633408> 32
==== osd_op_reply(2578 notify.3 [watch unwatch cookie 140686606264944]
v31545'3419 uv3231 ondisk = 0) v7 ==== 128+0+0 (1994465853 0 0)
0x7ff380000a20 con 0x7ff42730bca02017-07-05 08:50:19.888893 7ff4250219c0 20
remove_watcher() i=32017-07-05 08:50:19.888940 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.111:6802/3632911 <http://10.21.4.111:6802/3632911> --
osd_op(client.5971646.0:2579 24.a204812d notify.4 [watch unwatch cookie
140686606267200] snapc 0=[] ondisk+write+known_if_redirected e31545) v7 --
?+0 0x7ff4272d5950 con 0x7ff4273083d02017-07-05 08:50:19.891441
7ff3becf3700  1 -- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338>
<== osd.1 10.21.4.111:6802/3632911 <http://10.21.4.111:6802/3632911> 95
==== osd_op_reply(2579 notify.4 [watch unwatch cookie 140686606267200]
v31545'10058 uv9499 ondisk = 0) v7 ==== 128+0+0 (840319076 0 0)
0x7ff388001630 con 0x7ff4273083d02017-07-05 08:50:19.891511 7ff4250219c0 20
remove_watcher() i=42017-07-05 08:50:19.891545 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.112:6822/2795125 <http://10.21.4.112:6822/2795125> --
osd_op(client.5971646.0:2580 24.31099063 notify.5 [watch unwatch cookie
140686606269328] snapc 0=[] ondisk+write+known_if_redirected e31545) v7 --
?+0 0x7ff4272d5950 con 0x7ff427302b102017-07-05 08:50:19.893535
7ff3bf0f7700  1 -- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338>
<== osd.23 10.21.4.112:6822/2795125 <http://10.21.4.112:6822/2795125> 64
==== osd_op_reply(2580 notify.5 [watch unwatch cookie 140686606269328]
v31545'6809 uv6418 ondisk = 0) v7 ==== 128+0+0 (3042799304 0 0)
0x7ff3980014f0 con 0x7ff427302b102017-07-05 08:50:19.893592 7ff4250219c0 20
remove_watcher() i=52017-07-05 08:50:19.893624 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.113:6807/2176843 <http://10.21.4.113:6807/2176843> --
osd_op(client.5971646.0:2581 24.97c520d4 notify.6 [watch unwatch cookie
140686606271968] snapc 0=[] ondisk+write+known_if_redirected e31545) v7 --
?+0 0x7ff4272d5950 con 0x7ff427304ae02017-07-05 08:50:19.895393
7ff3beef5700  1 -- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338>
<== osd.27 10.21.4.113:6807/2176843 <http://10.21.4.113:6807/2176843> 64
==== osd_op_reply(2581 notify.6 [watch unwatch cookie 140686606271968]
v31545'6707 uv6306 ondisk = 0) v7 ==== 128+0+0 (1081188691 0 0)
0x7ff3900014f0 con 0x7ff427304ae02017-07-05 08:50:19.895454 7ff4250219c0 20
remove_watcher() i=62017-07-05 08:50:19.895485 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> -->
10.21.4.111:6802/3632911 <http://10.21.4.111:6802/3632911> --
osd_op(client.5971646.0:2582 24.84ada7c9 notify.7 [watch unwatch cookie
140686606274544] snapc 0=[] ondisk+write+known_if_redirected e31545) v7 --
?+0 0x7ff427300db0 con 0x7ff4273083d02017-07-05 08:50:19.897426
7ff3becf3700  1 -- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338>
<== osd.1 10.21.4.111:6802/3632911 <http://10.21.4.111:6802/3632911> 96
==== osd_op_reply(2582 notify.7 [watch unwatch cookie 140686606274544]
v31545'10059 uv9501 ondisk = 0) v7 ==== 128+0+0 (4260323257 0 0)
0x7ff388001630 con 0x7ff4273083d02017-07-05 08:50:19.897492 7ff4250219c0 20
remove_watcher() i=72017-07-05 08:50:19.898051 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4273083d0 -- 0x7ff427308c502017-07-05 08:50:19.898108 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272f48f0 -- 0x7ff4272f53702017-07-05 08:50:19.898150 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff42730bca0 -- 0x7ff42730c5102017-07-05 08:50:19.898203 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272eb9f0 -- 0x7ff4272ea7302017-07-05 08:50:19.898257 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff42732aee0 -- 0x7ff427329c202017-07-05 08:50:19.898296 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272e8480 -- 0x7ff4272e71c02017-07-05 08:50:19.898343 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272fd830 -- 0x7ff4272fc5702017-07-05 08:50:19.898469 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff427302b10 -- 0x7ff4273018002017-07-05 08:50:19.898518 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272efa00 -- 0x7ff4272ee6f02017-07-05 08:50:19.898642 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272f1120 -- 0x7ff4272f20f02017-07-05 08:50:19.898690 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff427304ae0 -- 0x7ff4273053902017-07-05 08:50:19.898739 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272f9cb0 -- 0x7ff4272f89f02017-07-05 08:50:19.898780 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272e0600 -- 0x7ff4272df2f02017-07-05 08:50:19.898820 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272e1d40 -- 0x7ff4272e2da02017-07-05 08:50:19.898938 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> mark_down
0x7ff4272d8160 -- 0x7ff4272d61602017-07-05 08:50:19.899192 7ff4250219c0  1
-- 10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338>
mark_down_all2017-07-05 08:50:19.899568 7ff4250219c0  1 --
10.21.4.1:0/3313807338 <http://10.21.4.1:0/3313807338> shutdown complete.*
I'll have a look at the tracker on how to open an issue there.

Regards,
Maarten

On Wed, Jul 5, 2017 at 8:26 AM, Orit Wasserman <[email protected]> wrote:

> Hi Maarten,
>
> On Tue, Jul 4, 2017 at 9:46 PM, Maarten De Quick <[email protected]>
> wrote:
>
>> Hi,
>>
>> Background: We're having issues with our index pool (slow requests / time
>> outs causes crashing of an OSD and a recovery -> application issues). We
>> know we have very big buckets (eg. bucket of 77 million objects with only
>> 16 shards) that need a reshard so we were looking at the resharding process.
>>
>> First thing we would like to do is making a backup of the bucket index,
>> but this failed with:
>>
>> # radosgw-admin -n client.radosgw.be-west-3 bi list
>> --bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup
>> 2017-07-03 21:28:30.325613 7f07fb8bc9c0  0 System already converted
>> ERROR: bi_list(): (4) Interrupted system call
>>
>>
> What version of are you using?
> Can you rerun the command with --debug-rgw=20 --debug-ms=1?
> Also please open a tracker issue (for rgw) with all the information.
>
> Thanks,
> Orit
>
> When I grep for "idx" and I count these:
>>  # grep idx priv-prod-up-alex.list.backup | wc -l
>> 2294942
>> When I do a bucket stats for that bucket I get:
>> # radosgw-admin -n client.radosgw.be-west-3 bucket stats
>> --bucket=priv-prod-up-alex | grep num_objects
>> 2017-07-03 21:33:05.776499 7faca49b89c0  0 System already converted
>>             "num_objects": 20148575
>>
>> It looks like there are 18 million objects missing and the backup is not
>> complete (not sure if that's a correct assumption?). We're also afraid that
>> the resharding command will face the same issue.
>> Has anyone seen this behaviour before or any thoughts on how to fix it?
>>
>> We were also wondering if we really need the backup. As the resharding
>> process creates a complete new index and keeps the old bucket, is there
>> maybe a possibility to relink your bucket to the old bucket in case of
>> issues? Or am I missing something important here?
>>
>> Any help would be greatly appreciated, thanks!
>>
>> Regards,
>> Maarten
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to