I recently switched from 16.2.x to 18.2.x and migrated to cephadm, since the
switch the cluster is constantly scrubbing, 24/7 up to 50 PGs simultaneously
and up to 20 deep scrubs simultaneously in a cluster that has only 12 (in use)
OSDs.
Furthermore it still manages to regularly have a warning with ‘pgs not scrubbed
in time’
I have tried various settings, like osd_deep_scrub_interval, osd_max_scrubs,
mds_max_scrub_ops_in_progress etc.
All those get ignored.
Please advice.
Here is an output of ceos config dump:
WHO MASK LEVEL OPTION VALUE
RO
global advanced auth_client_required cephx
*
global advanced auth_cluster_required cephx
*
global advanced auth_service_required cephx
*
global advanced auth_supported cephx
*
global basic container_image
quay.io/ceph/ceph@sha256:aca35483144ab3548a7f670db9b79772e6fc51167246421c66c0bd56a6585468
*
global basic device_failure_prediction_mode local
global advanced mon_allow_pool_delete true
global advanced mon_data_avail_warn 20
global advanced mon_max_pg_per_osd 400
global advanced osd_max_pg_per_osd_hard_ratio
10.000000
global advanced osd_pool_default_pg_autoscale_mode on
mon advanced auth_allow_insecure_global_id_reclaim false
mon advanced mon_crush_min_required_version
firefly
*
mon advanced mon_warn_on_pool_no_redundancy false
mon advanced public_network
10.79.0.0/16
*
mgr advanced mgr/balancer/active true
mgr advanced mgr/balancer/mode upmap
mgr advanced mgr/cephadm/manage_etc_ceph_ceph_conf_hosts
label:admin
*
mgr advanced mgr/cephadm/migration_current 6
*
mgr advanced mgr/dashboard/GRAFANA_API_PASSWORD admin
*
mgr advanced mgr/dashboard/GRAFANA_API_SSL_VERIFY false
*
mgr advanced mgr/dashboard/GRAFANA_API_URL
https://10.79.79.12:3000
*
mgr advanced mgr/dashboard/PROMETHEUS_API_HOST
http://10.79.79.12:9095
*
mgr advanced mgr/devicehealth/enable_monitoring true
mgr advanced mgr/orchestrator/orchestrator cephadm
osd advanced osd_map_cache_size 250
osd advanced osd_map_share_max_epochs 50
osd advanced osd_mclock_profile
high_client_ops
osd advanced osd_pg_epoch_persisted_max_stale 50
osd.0 basic osd_mclock_max_capacity_iops_hdd
380.869888
osd.1 basic osd_mclock_max_capacity_iops_hdd
441.000000
osd.10 basic osd_mclock_max_capacity_iops_ssd
13677.906485
osd.11 basic osd_mclock_max_capacity_iops_hdd
274.411212
osd.13 basic osd_mclock_max_capacity_iops_hdd
198.492501
osd.2 basic osd_mclock_max_capacity_iops_hdd
251.592009
osd.3 basic osd_mclock_max_capacity_iops_hdd
208.197434
osd.4 basic osd_mclock_max_capacity_iops_hdd
196.544082
osd.5 basic osd_mclock_max_capacity_iops_ssd
12739.225456
osd.6 basic osd_mclock_max_capacity_iops_hdd
211.288660
osd.7 basic osd_mclock_max_capacity_iops_hdd
210.543236
osd.8 basic osd_mclock_max_capacity_iops_hdd
242.241594
osd.9 basic osd_mclock_max_capacity_iops_hdd
559.933780
mds.plexfs basic mds_join_fs plexfs
Here is a ceph -s output
services:
mon: 3 daemons, quorum
lxt-prod-ceph-util02,lxt-prod-ceph-util01,lxt-prod-ceph-util03 (age 3w)
mgr: lxt-prod-ceph-util02.iyrhxj(active, since 3w), standbys:
lxt-prod-ceph-util03.wvstpe
mds: 1/1 daemons up
osd: 14 osds: 14 up (since 4w), 14 in (since 4w)
data:
volumes: 1/1 healthy
pools: 4 pools, 193 pgs
objects: 14.48M objects, 52 TiB
usage: 71 TiB used, 39 TiB / 110 TiB avail
pgs: 131 active+clean
47 active+clean+scrubbing
15 active+clean+scrubbing+deep
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]