Hi,

About Bluestore, sure there are checksum, but are they fully used ?
Rumors said that on a replicated pool, during recovery, they are not


> My thoughts on the subject are that even though checksums do allow to find 
> which replica is corrupt without having to figure which 2 out of 3 copies are 
> the same, this is not the only reason min_size=2 was required. Even if you 
> are running all SSD which are more reliable than HDD and are keeping the disk 
> size small so you could backfill quickly in case of a single disk failure, 
> you would still occasionally have longer periods of degraded operation. To 
> name a couple - a full node going down; or operator deliberately wiping an 
> OSD to rebuild it. min_size=1 in this case would leave you running with no 
> redundancy at all. DR scenario with pool-to-pool mirroring probably means 
> that you can not just replace the lost or incomplete PGs in your main site 
> from your DR, cause DR is likely to have a different PG layout, so full 
> resync from DR would be required in case of one disk lost during such 
> unprotected times.

I have to say, this is a common yet worthless argument
If I have 3000 OSD, using 2 or 3 replica will not change much : the
probability of losing 2 devices is still "high"

On the other hand, if I have a small cluster, less than a hundred OSD,
that same probability become "low"

I do not buy the "if someone is making a maintenance and a device fails"
either : this is a no-limit goal: what is X servers burns at the same
time ? What if an admin make a mistake and drop 5 OSD ? What is some
network tor or routers blow away ?
Should we do one replica par OSD ?


Thus, I would like to emphasis the technical sanity of using 2 replica,
versus the organisational sanity of doing so

Organisational stuff if specific to everybody, technical is shared by
all clusters

I would like people, especially the Ceph's devs and other people who
knows how it works deeply (read the code!) to give us their advices

Regards,
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to