[ceph-users] Ceph is constantly scrubbing 1/4 of all PGs and still have pigs not scrubbed in time

thymus_03fumbler Wed, 06 Mar 2024 10:45:25 -0800

I recently switched from 16.2.x to 18.2.x and migrated to cephadm, since the 
switch the cluster is constantly scrubbing, 24/7 up to 50 PGs simultaneously 
and up to 20 deep scrubs simultaneously in a cluster that has only 12 (in use) 
OSDs.
Furthermore it still manages to regularly have a warning with ‘pgs not scrubbed 
in time’


I have tried various settings, like osd_deep_scrub_interval, osd_max_scrubs, 
mds_max_scrub_ops_in_progress etc.
All those get ignored.

Please advice.

Here is an output of ceos config dump:

WHO         MASK  LEVEL     OPTION                                       VALUE  
                                                                                
    RO
global            advanced  auth_client_required                         cephx  
                                                                                
    *
global            advanced  auth_cluster_required                        cephx  
                                                                                
    *
global            advanced  auth_service_required                        cephx  
                                                                                
    *
global            advanced  auth_supported                               cephx  
                                                                                
    *
global            basic     container_image                              
quay.io/ceph/ceph@sha256:aca35483144ab3548a7f670db9b79772e6fc51167246421c66c0bd56a6585468
  *
global            basic     device_failure_prediction_mode               local
global            advanced  mon_allow_pool_delete                        true
global            advanced  mon_data_avail_warn                          20
global            advanced  mon_max_pg_per_osd                           400
global            advanced  osd_max_pg_per_osd_hard_ratio                
10.000000
global            advanced  osd_pool_default_pg_autoscale_mode           on
mon               advanced  auth_allow_insecure_global_id_reclaim        false
mon               advanced  mon_crush_min_required_version               
firefly                                                                         
           *
mon               advanced  mon_warn_on_pool_no_redundancy               false
mon               advanced  public_network                               
10.79.0.0/16                                                                    
           *
mgr               advanced  mgr/balancer/active                          true
mgr               advanced  mgr/balancer/mode                            upmap
mgr               advanced  mgr/cephadm/manage_etc_ceph_ceph_conf_hosts  
label:admin                                                                     
           *
mgr               advanced  mgr/cephadm/migration_current                6      
                                                                                
    *
mgr               advanced  mgr/dashboard/GRAFANA_API_PASSWORD           admin  
                                                                                
    *
mgr               advanced  mgr/dashboard/GRAFANA_API_SSL_VERIFY         false  
                                                                                
    *
mgr               advanced  mgr/dashboard/GRAFANA_API_URL                
https://10.79.79.12:3000                                                        
           *
mgr               advanced  mgr/dashboard/PROMETHEUS_API_HOST            
http://10.79.79.12:9095                                                         
           *
mgr               advanced  mgr/devicehealth/enable_monitoring           true
mgr               advanced  mgr/orchestrator/orchestrator                cephadm
osd               advanced  osd_map_cache_size                           250
osd               advanced  osd_map_share_max_epochs                     50
osd               advanced  osd_mclock_profile                           
high_client_ops
osd               advanced  osd_pg_epoch_persisted_max_stale             50
osd.0             basic     osd_mclock_max_capacity_iops_hdd             
380.869888
osd.1             basic     osd_mclock_max_capacity_iops_hdd             
441.000000
osd.10            basic     osd_mclock_max_capacity_iops_ssd             
13677.906485
osd.11            basic     osd_mclock_max_capacity_iops_hdd             
274.411212
osd.13            basic     osd_mclock_max_capacity_iops_hdd             
198.492501
osd.2             basic     osd_mclock_max_capacity_iops_hdd             
251.592009
osd.3             basic     osd_mclock_max_capacity_iops_hdd             
208.197434
osd.4             basic     osd_mclock_max_capacity_iops_hdd             
196.544082
osd.5             basic     osd_mclock_max_capacity_iops_ssd             
12739.225456
osd.6             basic     osd_mclock_max_capacity_iops_hdd             
211.288660
osd.7             basic     osd_mclock_max_capacity_iops_hdd             
210.543236
osd.8             basic     osd_mclock_max_capacity_iops_hdd             
242.241594
osd.9             basic     osd_mclock_max_capacity_iops_hdd             
559.933780
mds.plexfs        basic     mds_join_fs                                  plexfs





Here is a ceph -s output
services:
    mon: 3 daemons, quorum 
lxt-prod-ceph-util02,lxt-prod-ceph-util01,lxt-prod-ceph-util03 (age 3w)
    mgr: lxt-prod-ceph-util02.iyrhxj(active, since 3w), standbys: 
lxt-prod-ceph-util03.wvstpe
    mds: 1/1 daemons up
    osd: 14 osds: 14 up (since 4w), 14 in (since 4w)
data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 14.48M objects, 52 TiB
    usage:   71 TiB used, 39 TiB / 110 TiB avail
    pgs:     131 active+clean
             47  active+clean+scrubbing
             15  active+clean+scrubbing+deep
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Ceph is constantly scrubbing 1/4 of all PGs and still have pigs not scrubbed in time

Reply via email to