Hi Jon, can you reweight one OSD to default value and share outcome of "ceph osd df tree; ceph -s; ceph health detail" ?
Recently I was adding new node, 12x 4TB, one disk at a time and faced activating+remapped state for few hours. Not sure but maybe that was caused by "osd_max_backfills" value and backfill awaiting PGs queue. # ceph -s > cluster: > id: 1023c49f-3a10-42de-9f62-9b122db21e1e > health: HEALTH_WARN > noscrub,nodeep-scrub flag(s) set > 1 nearfull osd(s) > 19 pool(s) nearfull > 33336982/289660233 objects misplaced (11.509%) > Reduced data availability: 29 pgs inactive > Degraded data redundancy: 788023/289660233 objects degraded > (0.272%), 782 pgs unclean, 54 pgs degraded, 48 pgs undersized > > services: > mon: 3 daemons, quorum mon1,mon2,mon3 > mgr: mon2(active), standbys: mon3, mon1 > osd: 120 osds: 120 up, 120 in; 779 remapped pgs > flags noscrub,nodeep-scrub > rgw: 3 daemons active > > data: > pools: 19 pools, 3760 pgs > objects: 38285k objects, 146 TB > usage: 285 TB used, 150 TB / 436 TB avail > pgs: 0.771% pgs not active > 788023/289660233 objects degraded (0.272%) > 33336982/289660233 objects misplaced (11.509%) > 2978 active+clean > 646 active+remapped+backfill_wait > 57 active+remapped+backfilling > 27 active+undersized+degraded+remapped+backfill_wait > 25 activating+remapped > 17 active+undersized+degraded+remapped+backfilling > 4 activating+undersized+degraded+remapped > 3 active+recovery_wait+degraded > 3 active+recovery_wait+degraded+remapped > > io: > client: 2228 kB/s rd, 54831 kB/s wr, 539 op/s rd, 756 op/s wr > recovery: 1360 MB/s, 348 objects/s Now all PGs are active+clean. Regards Jakub
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
