Hi,
I have already balancer mode upmap enabled.
root@ld3955:/mnt/pve/pve_cephfs/template/iso# ceph balancer status
{
"active": true,
"plans": [],
"mode": "upmap"
}
However there are OSD with 60% and others with 90% usage belonging to
the same pool with the same disk size.
This looks to me like a big range.
Regards
Thomas
Am 23.09.2019 um 11:42 schrieb EDH - Manuel Rios Fernandez:
> Hi Thomas,
>
> For 100% byte distribution of data across OSD, you should setup ceph balancer
> in "byte" mode, not in PG mode.
>
> Change will distribute all osd with the same % of usage, but the objects will
> be NOT reduntant.
>
> After several weeks and months testing balancer the best profile is balance
> by PG with unmap.
>
> In PG mode you are going to get always "until balancer got a better
> algorithm" a not equially data distributed, an you sometime should manually
> redistribute weight by CLI.
>
> You can play with balancer directly from Dashboard from Nautilus. Balancer is
> not an "active" agent asked before storage data into disk, first ceph store
> data and them balancer move objects.
>
> Regards
>
> Manuel
>
>
> -----Mensaje original-----
> De: Thomas <[email protected]>
> Enviado el: lunes, 23 de septiembre de 2019 11:08
> Para: [email protected]
> Asunto: [ceph-users] OSD rebalancing issue - should drives be distributed
> equally over all nodes
>
> Hi,
>
> I'm facing several issues with my ceph cluster (2x MDS, 6x ODS).
> Here I would like to focus on the issue with pgs backfill_toofull.
> I assume this is related to the fact that the data distribution on my OSDs is
> not balanced.
>
> This is the current ceph status:
> root@ld3955:~# ceph -s
> cluster:
> id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
> health: HEALTH_ERR
> 1 MDSs report slow metadata IOs
> 78 nearfull osd(s)
> 1 pool(s) nearfull
> Reduced data availability: 2 pgs inactive, 2 pgs peering
> Degraded data redundancy: 304136/153251211 objects degraded
> (0.198%), 57 pgs degraded, 57 pgs undersized
> Degraded data redundancy (low space): 265 pgs backfill_toofull
> 3 pools have too many placement groups
> 74 slow requests are blocked > 32 sec
> 80 stuck requests are blocked > 4096 sec
>
> services:
> mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 98m)
> mgr: ld5505(active, since 3d), standbys: ld5506, ld5507
> mds: pve_cephfs:1 {0=ld3976=up:active} 1 up:standby
> osd: 368 osds: 368 up, 367 in; 302 remapped pgs
>
> data:
> pools: 5 pools, 8868 pgs
> objects: 51.08M objects, 195 TiB
> usage: 590 TiB used, 563 TiB / 1.1 PiB avail
> pgs: 0.023% pgs not active
> 304136/153251211 objects degraded (0.198%)
> 1672190/153251211 objects misplaced (1.091%)
> 8564 active+clean
> 196 active+remapped+backfill_toofull
> 57 active+undersized+degraded+remapped+backfill_toofull
> 35 active+remapped+backfill_wait
> 12 active+remapped+backfill_wait+backfill_toofull
> 2 active+remapped+backfilling
> 2 peering
>
> io:
> recovery: 18 MiB/s, 4 objects/s
>
>
> Currently I'm using 6 OSD nodes.
> Node A
> 48x 1.6TB HDD
> Node B
> 48x 1.6TB HDD
> Node C
> 48x 1.6TB HDD
> Node D
> 48x 1.6TB HDD
> Node E
> 48x 7.2TB HDD
> Node F
> 48x 7.2TB HDD
>
> Question:
> Is it advisable to distribute the drives equally over all nodes?
> If yes, how should this be executed w/o ceph disruption?
>
> Regards
> Thomas
>
> _______________________________________________
> ceph-users mailing list -- [email protected] To unsubscribe send an email to
> [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]