Sridhar, 
  
Thanks a lot for this explantation. It's clearer now. 
  
So at the end of the day (at least with balanced profile) it's a lower bound 
and no upper limit and a balanced distribution between client and cluster IOPS. 
   
 
Regards, 
Frédéric.  

   

-----Message original-----

De: Sridhar <ssesh...@redhat.com>
à: Frédéric <frederic.n...@univ-lorraine.fr>
Cc: ceph-users <ceph-us...@ceph.com>
Envoyé: mercredi 10 janvier 2024 08:15 CET
Sujet : Re: [ceph-users] How does mclock work?

  
Hello Frédéric, 
  
Please see answers below. 
    
Could someone please explain how mclock works regarding reads and writes? Does 
mclock intervene on both read and write iops? Or only on reads or only on 
writes?  
  
mClock schedules both read and write ops. 
    
And what type of underlying hardware performance is calculated and considered 
by mclock? Seems to be only write performance.  
  
Random write performance is considered for setting the maximum IOPS capacity of 
an OSD. This along with the sequential bandwidth 
capability of the OSD is used to calculate the cost per IO that is internally 
used by mClock for scheduling Ops. In addition, the mClock 
profiles use the capacity information to allocate reservation and limit for 
different classes of service (for e.g., client, background-recovery, 
scrub, snaptrim etc.). 
  
The write performance is used to set a lower bound on the amount of bandwidth 
to be allocated for different classes of services. For e.g., 
the 'balanced' profile allocates 50% of the OSD's IOPS capacity to cllent ops. 
In other words, a minimum guarantee of 50% of the OSD's 
bandwidth is allocated to client ops (read or write). If you look at the 
'balanced' profile, there is no upper limit set for client ops (i.e. set to 
MAX) which means that reads can potentially use the maximum possible bandwidth 
(i.e., not contrained by max IOPS capacity) if there 
are no other competing ops.  
  
Please see 
https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/#built-in-profiles
 for more information about mClock profiles. 
    
The mclock documentation shows HDDs and SSDs specific configuration options 
(capacity and sequential bandwidth) but nothing regarding hybrid setups and 
these configuration options do not distinguish reads and writes. But read and 
write performance are often not in par for a single drive and even less when 
using hybrid setups. 
  
With hybrid setups (RocksDB+WAL on SSDs or NVMes and Data on HDD), if mclock 
only considers write performance, it may fail to properly schedule read iops 
(does mclock schedule read iops?) as the calculated iops capacity would be way 
too high for reads. 
  
With HDD only setups (RocksDB+WAL+Data on HDD), if mclock only considers write 
performance, the OSD may not take advantage of higher read performance. 
  
Can someone please shed some light on this?  
  
As mentioned above, as long as there are no competing ops, the mClock profiles 
ensure that there is nothing constraining client 
ops from using the full available bandwidth of an OSD for both reads and writes 
regardless of the type of setup (hybrid, HDD, 
SSD) being employed. The important aspect is to ensure that the set IOPS 
capacity for the OSD reflects a fairly accurate 
representation of the underlying device capability. This is because the 
reservation criteria based the IOPS capacity helps 
maintain an acceptable level of performance with other active competing ops. 
  
You could run some synthetic benchmarks to ensure that read and write 
performance are along expected lines with the 
default mClock profile to confirm the above. 
  
I hope this helps. 
  
-Sridhar    
                 
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to