On 01/08/2013 18:42, Loic Dachary wrote: > Hi Sam, > > When the acting set changes order two chunks for the same object may co-exist > in the same placement group. The key should therefore also contain the chunk > number. > > That's probably the most sensible comment I have so far. This document is > immensely useful (even in its current state) because it shows me your > perspective on the implementation. > > I'm puzzled by:
I get it ( thanks to yanzheng ). Object is deleted, then created again ... spurious non version chunks would get in the way. :-) > > CEPH_OSD_OP_DELETE: The possibility of rolling back a delete requires that we > retain the deleted object until all replicas have persisted the deletion > event. ErasureCoded backend will therefore need to store objects with the > version at which they were created included in the key provided to the > filestore. Old versions of an object can be pruned when all replicas have > committed up to the log event deleting the object. > > because I don't understand why the version would be necessary. I thought that > deleting an erasure coded object could be even easier than erasing a > replicated object because it cannot be resurrected if enough chunks are lots, > therefore you don't need to wait for ack from all OSDs in the up set. I'm > obviously missing something. > > I failed to understand how important the pg logs were to maintaining the > consistency of the PG. For some reason I thought about them only in terms of > being a light weight version of the operation logs. Adding a payload to the > pg_log_entry ( i.e. APPEND size or attribute ) is a new idea for me and I > would have never thought or dared think the logs could be extended in such a > way. Given the recent problems with logs writes having a high impact on > performances ( I'm referring to what forced you to introduce code to reduce > the amount of logs being written to only those that have been changed instead > of the complete logs ) I thought about the pg logs as something immutable. > > I'm still trying to figure out how PGBackend::perform_write / read / > try_rollback would fit in the current backfilling / write / read / scrubbing > ... code path. > > https://github.com/athanatos/ceph/blob/ba5c97eda4fe72a25831031a2cffb226fed8d9b7/doc/dev/osd_internals/erasure_coding.rst > https://github.com/athanatos/ceph/blob/ba5c97eda4fe72a25831031a2cffb226fed8d9b7/src/osd/PGBackend.h > > Cheers > -- Loïc Dachary, Artisan Logiciel Libre All that is necessary for the triumph of evil is that good people do nothing.
signature.asc
Description: OpenPGP digital signature
