Here is the strace result.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
99.94    0.236170         790       299         5 futex
  0.06    0.000136           0       365           brk
  0.00    0.000000           0        41         2 read
  0.00    0.000000           0        48           write
  0.00    0.000000           0        72        27 open
  0.00    0.000000           0        43           close
  0.00    0.000000           0        10         5 stat
  0.00    0.000000           0        36           fstat
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0       103           mmap
  0.00    0.000000           0        70           mprotect
  0.00    0.000000           0        19           munmap
  0.00    0.000000           0        11           rt_sigaction
  0.00    0.000000           0        32           rt_sigprocmask
  0.00    0.000000           0        26        26 access
  0.00    0.000000           0         3           pipe
  0.00    0.000000           0        19           clone
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         7           uname
  0.00    0.000000           0        12           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         2           sysinfo
  0.00    0.000000           0         1           getuid
  0.00    0.000000           0         1           prctl
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           gettid
  0.00    0.000000           0         3           epoll_create
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           membarrier
------ ----------- ----------- --------- --------- ----------------
100.00    0.236306                  1231        65 total
From: David Turner <drakonst...@gmail.com>
Sent: Friday, 1 March 2019 11:46 AM
To: Glen Baars <g...@onsitecomputers.com.au>
Cc: Wido den Hollander <w...@42on.com>; ceph-users <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness

Have you used strace on the du command to see what it's spending its time doing?

On Thu, Feb 28, 2019, 8:45 PM Glen Baars 
<g...@onsitecomputers.com.au<mailto:g...@onsitecomputers.com.au>> wrote:
Hello Wido,

The cluster layout is as follows:

3 x Monitor hosts ( 2 x 10Gbit bonded )
9 x OSD hosts (
2 x 10Gbit bonded,
LSI cachecade and write cache drives set to single,
All HDD in this pool,
no separate DB / WAL. With the write cache and the SSD read cache on the LSI 
card it seems to perform well.
168 OSD disks

No major increase in OSD disk usage or CPU usage. The RBD DU process uses 100% 
of a single 2.4Ghz core while running - I think that is the limiting factor.

I have just tried removing most of the snapshots for that volume ( from 14 
snapshots down to 1 snapshot ) and the rbd du command now takes around 2-3 
minutes.

Kind regards,
Glen Baars

-----Original Message-----
From: Wido den Hollander <w...@42on.com<mailto:w...@42on.com>>
Sent: Thursday, 28 February 2019 5:05 PM
To: Glen Baars 
<g...@onsitecomputers.com.au<mailto:g...@onsitecomputers.com.au>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness



On 2/28/19 9:41 AM, Glen Baars wrote:
> Hello Wido,
>
> I have looked at the libvirt code and there is a check to ensure that 
> fast-diff is enabled on the image and only then does it try to get the real 
> disk usage. The issue for me is that even with fast-diff enabled it takes 
> 25min to get the space usage for a 50TB image.
>
> I had considered turning off fast-diff on the large images to get
> around to issue but I think that will hurt my snapshot removal times (
> untested )
>

Can you tell a bit more about the Ceph cluster? HDD? SSD? DB and WAL on SSD?

Do you see OSDs spike in CPU or Disk I/O when you do a 'rbd du' on these images?

Wido

> I can't see in the code any other way of bypassing the disk usage check but I 
> am not that familiar with the code.
>
> -------------------
>     if (volStorageBackendRBDUseFastDiff(features)) {
>         VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. "
>                   "Querying for actual allocation",
>                   def->source.name<http://source.name>, vol->name);
>
>         if (virStorageBackendRBDSetAllocation(vol, image, &info) < 0)
>             goto cleanup;
>     } else {
>         vol->target.allocation = info.obj_size * info.num_objs; }
> ------------------------------
>
> Kind regards,
> Glen Baars
>
> -----Original Message-----
> From: Wido den Hollander <w...@42on.com<mailto:w...@42on.com>>
> Sent: Thursday, 28 February 2019 3:49 PM
> To: Glen Baars 
> <g...@onsitecomputers.com.au<mailto:g...@onsitecomputers.com.au>>;
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness
>
>
>
> On 2/28/19 2:59 AM, Glen Baars wrote:
>> Hello Ceph Users,
>>
>> Has anyone found a way to improve the speed of the rbd du command on large 
>> rbd images? I have object map and fast diff enabled - no invalid flags on 
>> the image or it's snapshots.
>>
>> We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to Ubuntu 
>> 18.04. The upgrades libvirt to version 4. When libvirt 4 adds an rbd pool it 
>> discovers all images in the pool and tries to get their disk usage. We are 
>> seeing a 50TB image take 25min. The pool has over 300TB of images in it and 
>> takes hours for libvirt to start.
>>
>
> This is actually a pretty bad thing imho. As a lot of images people will be 
> using do not have fast-diff enabled (images from the past) and that will kill 
> their performance.
>
> Isn't there a way to turn this off in libvirt?
>
> Wido
>
>> We can replicate the issue without libvirt by just running a rbd du on the 
>> large images. The limiting factor is the cpu on the rbd du command, it uses 
>> 100% of a single core.
>>
>> Our cluster is completely bluestore/mimic 13.2.4. 168 OSDs, 12 Ubuntu 16.04 
>> hosts.
>>
>> Kind regards,
>> Glen Baars
>> This e-mail is intended solely for the benefit of the addressee(s) and any 
>> other named recipient. It is confidential and may contain legally privileged 
>> or confidential information. If you are not the recipient, any use, 
>> distribution, disclosure or copying of this e-mail is prohibited. The 
>> confidentiality and legal privilege attached to this communication is not 
>> waived or lost by reason of the mistaken transmission or delivery to you. If 
>> you have received this e-mail in error, please notify us immediately.
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> This e-mail is intended solely for the benefit of the addressee(s) and any 
> other named recipient. It is confidential and may contain legally privileged 
> or confidential information. If you are not the recipient, any use, 
> distribution, disclosure or copying of this e-mail is prohibited. The 
> confidentiality and legal privilege attached to this communication is not 
> waived or lost by reason of the mistaken transmission or delivery to you. If 
> you have received this e-mail in error, please notify us immediately.
>
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to