min_size will also block reads.  Just to add a +1 to what has been said, a
write operation will always wait to ack until all osds for a PG have acked
the write.  min_size has absolutely no affect on this.  min_size is
calculated BEFORE the write or read is handled by any osds.  If there is
not the appropriate min_size, then the read and write will block until
there are.

On Wed, Jan 3, 2018 at 9:59 AM Ronny Aasen <[email protected]>
wrote:

> On 03. jan. 2018 14:51, James Poole wrote:
> > Hi all,
> >
> > Whilst on a training course recently I was told that 'min_size' had an
> > affect on client write performance, in that it's the required number of
> > copies before ceph reports back to the client that an object has been
> > written therefore setting a 'min_size' of 0 would only require a write
> > to be accepted by the journal before confirming it's been accepted.
> >
> > This is contrary to further reading elsewhere that the 'min_size' is the
> > minimum number of copies required of an object to allow I/O and that
> > 'size' is the parameter that would affect write speed i.e. desired
> > number of replicas.
> >
> > Setting 'min_size' to 0 with a 'size' of 3 you would still have an
> > effective 'min_size' of 2 from:
> >
> > https://raw.githubusercontent.com/ceph/ceph/master/doc/release-notes.rst
> >
> > "* Degraded mode (when there fewer than the desired number of replicas)
> > is now more configurable on a per-pool basis, with the min_size
> > parameter. By default, with min_size 0, this allows I/O to objects
> > with N - floor(N/2) replicas, where N is the total number of
> > expected copies. Argonaut behavior was equivalent to having min_size
> > = 1, so I/O would always be possible if any completely up to date
> > copy remained. min_size = 1 could result in lower overall
> > availability in certain cases, such as flapping network partition"
> >
> > Which leads to the conclusion that changing 'min_size' has nothing to do
> > with performance but is solely related to data integrity/resilience.
> >
> > Could someone confirm my assertion is correct?
> >
> > Many thanks
> >
> > James
>
>
> you are correct that it is related to data integrity.
>
>
> the writes to a osd filestore is allways acked internally when it have
> hit the journal. unrelated to size/min_size.
>
> in normal operation, all osd's must ack the write before the write is
> acked to the client: iow all 3 (size 3) must ack. and min_size is not
> relevant in any case.
>
> min_size is only relevant when a pg is degraded while being remapped or
> backfilled (or degraded because of no space to remap/backfill into)
> because of a osd or node failure. in that case min_size specify how many
> osd's must ack the write before the write is acked to the client.
>
> since failure is most likely when disks are stressing (eg with rebuild),
> reducing min_size is just asking for corruption and data loss.
>
> kind regards
> Ronny Aasen
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to