Hello Jason,
according to this, latency between client and osd should not be the problem:
(the high amount of user time in the measure above, network
communication should not be the problem)
Finding the involved osd:
# ceph osd map RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
rbd_directory
osdmap e7570 pool 'RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c'
(14) object 'rbd_directory' -> pg 14.30a98c1c (14.1c) -> up ([36,0,38],
p36) acting ([36,0,38], p36)
# ceph osd find osd.36
{
"osd": 36,
"ip": "10.23.27.149:6826/7195",
"crush_location": {
"host": "ceph-ssd-s39",
"root": "default"
}
}
ssh ceph-ssd-s39
# nuttcp -w1m ceph-mon-s43
11186.3391 MB / 10.00 sec = 9381.8890 Mbps 12 %TX 32 %RX 0 retrans 0.15
msRTT
# time rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
--rbd_concurrent_management_ops=1 --id xen_test
NAME SIZE
PARENT
FMT PROT LOCK
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81 20480M
RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE
2
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81@BASE 20480M
RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE
2 yes
...
RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3 20480M
RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE
2
__srlock__
0
2
real 0m23.667s
user 0m15.949s
sys 0m1.276s
# time rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
--rbd_concurrent_management_ops=1 --id xen_test
NAME SIZE
PARENT
FMT PROT LOCK
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81 20480M
RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE
2
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81@BASE 20480M
RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE
2 yes
...
RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3 20480M
RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE
2
....
__srlock__
0
2
real 0m13.937s
user 0m14.404s
sys 0m1.089s
Regards
Marc
Am 25.04.2018 um 16:38 schrieb Jason Dillaman:
> I'd check your latency between your client and your cluster. On my
> development machine w/ only a single OSD running and 200 clones, each
> with 1 snapshot, "rbd -l" only takes a couple seconds for me:
>
> $ time rbd ls -l --rbd_concurrent_management_ops=1 | wc -l
> 403
>
> real 0m1.746s
> user 0m1.136s
> sys 0m0.169s
>
> Also, I have to ask, but how often are you expecting to scrape the
> images from pool? The long directory list involves opening each image
> in the pool (which involves numerous round-trips to the OSDs) plus
> iterating through each snapshot (which also involves round-trips).
>
> On Wed, Apr 25, 2018 at 10:13 AM, Marc Schöchlin <[email protected]> wrote:
>> Hello Piotr,
>>
>> i updated the issue.
>> (https://tracker.ceph.com/issues/23853?next_issue_id=23852&prev_issue_id=23854)
>>
>> # time rbd ls -l --pool
>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
>> --rbd_concurrent_management_ops=1
>> NAME SIZE PARENT
>>
>> RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3 20480M
>> RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE
>> 2
>> __srlock__
>> 0
>> 2
>> ....
>> real 0m18.562s
>> user 0m12.513s
>> sys 0m0.793s
>>
>> I also attached a json dump of my pool structure.
>>
>> Regards
>> Marc
>>
>> Am 25.04.2018 um 14:46 schrieb Piotr Dałek:
>>> On 18-04-25 02:29 PM, Marc Schöchlin wrote:
>>>> Hello list,
>>>>
>>>> we are trying to integrate a storage repository in xenserver.
>>>> (i also describe the problem as a issue in the ceph bugtracker:
>>>> https://tracker.ceph.com/issues/23853)
>>>>
>>>> Summary:
>>>>
>>>> The slowness is a real pain for us, because this prevents the xen
>>>> storage repository to work efficently.
>>>> Gathering information for XEN Pools with hundreds of virtual machines
>>>> (using "--format json") would be a real pain...
>>>> The high user time consumption and the really huge amount of threads
>>>> suggests that there is something really inefficient in the "rbd"
>>>> utility.
>>>>
>>>> So what can i do to make "rbd ls -l" faster or to get comparable
>>>> information regarding snapshot hierarchy information?
>>> Can you run this command with extra argument
>>> "--rbd_concurrent_management_ops=1" and share the timing of that?
>>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com