On 01/08/2013 18:42, Loic Dachary wrote:
> Hi Sam,
> 
> When the acting set changes order two chunks for the same object may co-exist 
> in the same placement group. The key should therefore also contain the chunk 
> number. 
> 
> That's probably the most sensible comment I have so far. This document is 
> immensely useful (even in its current state) because it shows me your 
> perspective on the implementation. 
> 
> I'm puzzled by:

I get it ( thanks to yanzheng ). Object is deleted, then created again ... 
spurious non version chunks would get in the way.

:-)

> 
> CEPH_OSD_OP_DELETE: The possibility of rolling back a delete requires that we 
> retain the deleted object until all replicas have persisted the deletion 
> event. ErasureCoded backend will therefore need to store objects with the 
> version at which they were created included in the key provided to the 
> filestore. Old versions of an object can be pruned when all replicas have 
> committed up to the log event deleting the object.
> 
> because I don't understand why the version would be necessary. I thought that 
> deleting an erasure coded object could be even easier than erasing a 
> replicated object because it cannot be resurrected if enough chunks are lots, 
> therefore you don't need to wait for ack from all OSDs in the up set. I'm 
> obviously missing something.
> 
> I failed to understand how important the pg logs were to maintaining the 
> consistency of the PG. For some reason I thought about them only in terms of 
> being a light weight version of the operation logs. Adding a payload to the 
> pg_log_entry ( i.e. APPEND size or attribute ) is a new idea for me and I 
> would have never thought or dared think the logs could be extended in such a 
> way. Given the recent problems with logs writes having a high impact on 
> performances ( I'm referring to what forced you to introduce code to reduce 
> the amount of logs being written to only those that have been changed instead 
> of the complete logs ) I thought about the pg logs as something immutable.
> 
> I'm still trying to figure out how PGBackend::perform_write / read / 
> try_rollback would fit in the current backfilling / write / read / scrubbing 
> ... code path. 
> 
> https://github.com/athanatos/ceph/blob/ba5c97eda4fe72a25831031a2cffb226fed8d9b7/doc/dev/osd_internals/erasure_coding.rst
> https://github.com/athanatos/ceph/blob/ba5c97eda4fe72a25831031a2cffb226fed8d9b7/src/osd/PGBackend.h
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to