Re: [ceph-users] OSD down after PG increase

Dan Van Der Ster Thu, 13 Mar 2014 03:15:27 -0700

Why do you create so many PGs ?? The goal is 100 per OSD, with your numbers you 
have


3 * (48000) / 140 ~= 1000 per OSD.


-- Dan van der Ster || Data & Storage Services || CERN IT Department --


On 13 Mar 2014 at 11:11:16, Kasper Dieter 
([email protected]<mailto:[email protected]>) wrote:

We have observed a very similar behavior.

In a 140 OSD cluster (new created and idle) ~8000 PGs are available.
After adding two new pools (each with 20000 PGs)
100 out of 140 OSDs are going down + out.
The cluster never recovers.

This problem can be reproduced every time with v0.67 and 0.72.

With v0.61 this problem does not show up.


-Dieter


On Thu, Mar 13, 2014 at 10:46:05AM +0100, Gandalf Corvotempesta wrote:
> 2014-03-13 9:02 GMT+01:00 Andrey Korolyov <[email protected]>:
> > Yes, if you have essentially high amount of commited data in the cluster
> > and/or large number of PG(tens of thousands).
>
> I've increased from 64 to 8192 PGs
>
> > If you have a room to
> > experiment with this transition from scratch you may want to play with
> > numbers in the OSD` queues since they causing deadlock-like behaviour on
> > operations like increasing PG count or large pool deletion. If cluster
> > has no I/O at all at the moment, such behaviour is not expected definitely.
>
> My cluster was totally idle, it's a test with ceph-ansible repository and 
> nobody
> was using it.
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD down after PG increase

Reply via email to