Thomas , Check the documentation for CLI "ceph osd reweight-by-utilization" and run it.
Another problem of unbalanced data is the pool global data available . Regards Manuel -----Mensaje original----- De: Thomas <[email protected]> Enviado el: lunes, 23 de septiembre de 2019 11:49 Para: EDH - Manuel Rios Fernandez <[email protected]>; [email protected] Asunto: Re: [ceph-users] OSD rebalancing issue - should drives be distributed equally over all nodes Hi, I have already balancer mode upmap enabled. root@ld3955:/mnt/pve/pve_cephfs/template/iso# ceph balancer status { "active": true, "plans": [], "mode": "upmap" } However there are OSD with 60% and others with 90% usage belonging to the same pool with the same disk size. This looks to me like a big range. Regards Thomas Am 23.09.2019 um 11:42 schrieb EDH - Manuel Rios Fernandez: > Hi Thomas, > > For 100% byte distribution of data across OSD, you should setup ceph balancer > in "byte" mode, not in PG mode. > > Change will distribute all osd with the same % of usage, but the objects will > be NOT reduntant. > > After several weeks and months testing balancer the best profile is balance > by PG with unmap. > > In PG mode you are going to get always "until balancer got a better > algorithm" a not equially data distributed, an you sometime should manually > redistribute weight by CLI. > > You can play with balancer directly from Dashboard from Nautilus. Balancer is > not an "active" agent asked before storage data into disk, first ceph store > data and them balancer move objects. > > Regards > > Manuel > > > -----Mensaje original----- > De: Thomas <[email protected]> > Enviado el: lunes, 23 de septiembre de 2019 11:08 > Para: [email protected] > Asunto: [ceph-users] OSD rebalancing issue - should drives be > distributed equally over all nodes > > Hi, > > I'm facing several issues with my ceph cluster (2x MDS, 6x ODS). > Here I would like to focus on the issue with pgs backfill_toofull. > I assume this is related to the fact that the data distribution on my OSDs is > not balanced. > > This is the current ceph status: > root@ld3955:~# ceph -s > cluster: > id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae > health: HEALTH_ERR > 1 MDSs report slow metadata IOs > 78 nearfull osd(s) > 1 pool(s) nearfull > Reduced data availability: 2 pgs inactive, 2 pgs peering > Degraded data redundancy: 304136/153251211 objects degraded > (0.198%), 57 pgs degraded, 57 pgs undersized > Degraded data redundancy (low space): 265 pgs backfill_toofull > 3 pools have too many placement groups > 74 slow requests are blocked > 32 sec > 80 stuck requests are blocked > 4096 sec > > services: > mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 98m) > mgr: ld5505(active, since 3d), standbys: ld5506, ld5507 > mds: pve_cephfs:1 {0=ld3976=up:active} 1 up:standby > osd: 368 osds: 368 up, 367 in; 302 remapped pgs > > data: > pools: 5 pools, 8868 pgs > objects: 51.08M objects, 195 TiB > usage: 590 TiB used, 563 TiB / 1.1 PiB avail > pgs: 0.023% pgs not active > 304136/153251211 objects degraded (0.198%) > 1672190/153251211 objects misplaced (1.091%) > 8564 active+clean > 196 active+remapped+backfill_toofull > 57 active+undersized+degraded+remapped+backfill_toofull > 35 active+remapped+backfill_wait > 12 active+remapped+backfill_wait+backfill_toofull > 2 active+remapped+backfilling > 2 peering > > io: > recovery: 18 MiB/s, 4 objects/s > > > Currently I'm using 6 OSD nodes. > Node A > 48x 1.6TB HDD > Node B > 48x 1.6TB HDD > Node C > 48x 1.6TB HDD > Node D > 48x 1.6TB HDD > Node E > 48x 7.2TB HDD > Node F > 48x 7.2TB HDD > > Question: > Is it advisable to distribute the drives equally over all nodes? > If yes, how should this be executed w/o ceph disruption? > > Regards > Thomas > > _______________________________________________ > ceph-users mailing list -- [email protected] To unsubscribe send an > email to [email protected] > _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
