min_size will also block reads. Just to add a +1 to what has been said, a write operation will always wait to ack until all osds for a PG have acked the write. min_size has absolutely no affect on this. min_size is calculated BEFORE the write or read is handled by any osds. If there is not the appropriate min_size, then the read and write will block until there are.
On Wed, Jan 3, 2018 at 9:59 AM Ronny Aasen <[email protected]> wrote: > On 03. jan. 2018 14:51, James Poole wrote: > > Hi all, > > > > Whilst on a training course recently I was told that 'min_size' had an > > affect on client write performance, in that it's the required number of > > copies before ceph reports back to the client that an object has been > > written therefore setting a 'min_size' of 0 would only require a write > > to be accepted by the journal before confirming it's been accepted. > > > > This is contrary to further reading elsewhere that the 'min_size' is the > > minimum number of copies required of an object to allow I/O and that > > 'size' is the parameter that would affect write speed i.e. desired > > number of replicas. > > > > Setting 'min_size' to 0 with a 'size' of 3 you would still have an > > effective 'min_size' of 2 from: > > > > https://raw.githubusercontent.com/ceph/ceph/master/doc/release-notes.rst > > > > "* Degraded mode (when there fewer than the desired number of replicas) > > is now more configurable on a per-pool basis, with the min_size > > parameter. By default, with min_size 0, this allows I/O to objects > > with N - floor(N/2) replicas, where N is the total number of > > expected copies. Argonaut behavior was equivalent to having min_size > > = 1, so I/O would always be possible if any completely up to date > > copy remained. min_size = 1 could result in lower overall > > availability in certain cases, such as flapping network partition" > > > > Which leads to the conclusion that changing 'min_size' has nothing to do > > with performance but is solely related to data integrity/resilience. > > > > Could someone confirm my assertion is correct? > > > > Many thanks > > > > James > > > you are correct that it is related to data integrity. > > > the writes to a osd filestore is allways acked internally when it have > hit the journal. unrelated to size/min_size. > > in normal operation, all osd's must ack the write before the write is > acked to the client: iow all 3 (size 3) must ack. and min_size is not > relevant in any case. > > min_size is only relevant when a pg is degraded while being remapped or > backfilled (or degraded because of no space to remap/backfill into) > because of a osd or node failure. in that case min_size specify how many > osd's must ack the write before the write is acked to the client. > > since failure is most likely when disks are stressing (eg with rebuild), > reducing min_size is just asking for corruption and data loss. > > kind regards > Ronny Aasen > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
