[ceph-users] Re: extract disk usage stats from running ceph cluster

Joe Comeau Mon, 10 Feb 2020 12:56:30 -0800

try from admin node
 
ceph osd df
ceph osd status
thanks Joe
 

>>> <c...@elchaka.de> 2/10/2020 10:44 AM >>>
Hello MJ,


Perhaps your PGs are a unbalanced?

Ceph osd df tree

Greetz
Mehmet 

Am 10. Februar 2020 14:58:25 MEZ schrieb lists <li...@merit.unu.edu>:
>Hi,
>
>We would like to replace the current seagate ST4000NM0034 HDDs in our 
>ceph cluster with SSDs, and before doing that, we would like to
>checkout 
>the typical usage of our current drives, over the last years, so we can
>
>select the best (price/performance/endurance) SSD to replace them with.
>
>I am trying to extract this info from the fields "Blocks received from 
>initiator" / "blocks sent to initiator", as these are the fields 
>smartctl gets from the seagate disks. But the numbers seem strange, and
>
>I would like to request feedback here.
>
>Three nodes, all equal, 8 OSDs per node, all 4TB ST4000NM0034 
>(filestore) HDDs with SSD-based journals:
>
>> root@node1:~# ceph osd crush tree
>> ID CLASS WEIGHT   TYPE NAME
>> -1      87.35376 root default
>> -2      29.11688     host node1
>>  0   hdd  3.64000             osd.0
>>  1   hdd  3.64000             osd.1
>>  2   hdd  3.63689             osd.2
>>  3   hdd  3.64000             osd.3
>> 12   hdd  3.64000        osd.12
>> 13   hdd  3.64000        osd.13
>> 14   hdd  3.64000        osd.14
>> 15   hdd  3.64000        osd.15
>> -3      29.12000     host node2
>>  4   hdd  3.64000             osd.4
>>  5   hdd  3.64000             osd.5
>>  6   hdd  3.64000             osd.6
>>  7   hdd  3.64000             osd.7
>> 16   hdd  3.64000        osd.16
>> 17   hdd  3.64000        osd.17
>> 18   hdd  3.64000        osd.18
>> 19   hdd  3.64000        osd.19
>> -4      29.11688     host node3
>>  8   hdd  3.64000             osd.8
>>  9   hdd  3.64000             osd.9
>> 10   hdd  3.64000        osd.10
>> 11   hdd  3.64000        osd.11
>> 20   hdd  3.64000        osd.20
>> 21   hdd  3.64000        osd.21
>> 22   hdd  3.64000        osd.22
>> 23   hdd  3.63689        osd.23
>
>We are looking at the numbers from smartctl, and basing our
>calculations 
>on this output for each individual various OSD:
>> Vendor (Seagate) cache information
>>   Blocks sent to initiator = 3783529066
>>   Blocks received from initiator = 3121186120
>>   Blocks read from cache and sent to initiator = 545427169
>>   Number of read and write commands whose size <= segment size =
>93877358
>>   Number of read and write commands whose size > segment size =
>2290879
>
>I created the following spreadsheet:
>
>>      blocks sent     blocks received total blocks    
>>       to initiator    from initiator    calculated    read%    write%        
>>  aka
>> node1
>> osd0 905060564       1900663448      2805724012      32,26%  67,74%          
>> sda
>> osd1 2270442418      3756215880      6026658298      37,67%  62,33%          
>> sdb
>> osd2 3531938448      3940249192      7472187640      47,27%  52,73%          
>> sdc
>> osd3 2824808123      3130655416      5955463539      47,43%  52,57%          
>> sdd
>> osd12        1956722491      1294854032      3251576523      60,18%  39,82%  
>>         sdg
>> osd13        3410188306      1265443936      4675632242      72,94%  27,06%  
>>         sdh
>> osd14        3765454090      3115079112      6880533202      54,73%  45,27%  
>>         sdi
>> osd15        2272246730      2218847264      4491093994      50,59%  49,41%  
>>         sdj
>>                                                      
>> node2                                                        
>> osd4 3974937107      740853712       4715790819      84,29%  15,71%          
>> sda
>> osd5 1181377668      2109150744      3290528412      35,90%  64,10%          
>> sdb
>> osd5 1903438106      608869008       2512307114      75,76%  24,24%          
>> sdc
>> osd7 3511170043      724345936       4235515979      82,90%  17,10%          
>> sdd
>> osd16        2642731906      3981984640      6624716546      39,89%  60,11%  
>>         sdg
>> osd17        3994977805      3703856288      7698834093      51,89%  48,11%  
>>         sdh
>> osd18        3992157229      2096991672      6089148901      65,56%  34,44%  
>>         sdi
>> osd19        279766405       1053039640      1332806045      20,99%  79,01%  
>>         sdj
>>                                                      
>> node3                                                        
>> osd8 3711322586      234696960       3946019546      94,05%  5,95%           
>> sda
>> osd9 1203912715      3132990000      4336902715      27,76%  72,24%          
>> sdb
>> osd10        912356010       1681434416      2593790426      35,17%  64,83%  
>>         sdc
>> osd11        810488345       2626589896      3437078241      23,58%  76,42%  
>>         sdd
>> osd20        1506879946      2421596680      3928476626      38,36%  61,64%  
>>         sdg
>> osd21        2991526593      7525120         2999051713      99,75%  0,25%   
>>         sdh
>> osd22        29560337        3226114552      3255674889      0,91%   99,09%  
>>         sdi
>> osd23        2019195656      2563506320      4582701976      44,06%  55,94%  
>>         sdj
>
>But as can be seen above, this results in some very strange numbers,
>for 
>example node3/osd21 and node2/osd19, node3/osd8, the numbers are
>unlikely.
>
>So, probably we're doing something wrong in our logic here.
>
>Can someone explain what we're doing wrong, and is it possible to
>obtain 
>stats like these also from ceph directly? Does ceph keep historical 
>stats like above..?
>
>MJ
>_______________________________________________
>ceph-users mailing list -- ceph-users@ceph.io
>To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: extract disk usage stats from running ceph cluster

Reply via email to