Here's what I'm thinking:

Above the ObjectStore, we clear everything from temp on restart anyway.

We always write bits of a temp object in pieces, and then at the end 
copy/move it into the main collection.  

On replay, we should *only* do that final move/rename if the temp object 
was replayed in its entirety.

So:

- clear out temp collections in the filestore on startup.

- give temp objects unique names so that they don't collide with non-temp 
object fd caching (or whatever else).  for the DBObjectMap part there is 
probably some futzing though to make this work right.

- add a new 'move_from_temp' type operation that renames an object a temp 
(coll_t::is_temp()) collection to a non-temp one.  it will succeed iff the 
temp source exists.

- all operations that write to temp objects fail if the object doesn't 
already exist, except an explicit 'create' op

- all transactions the osds generate that write to temp object start with 
that explicit create.

The combination of these thigns means that we will only have a temp source 
for the move_from_temp op if it is complete.  Which I think means we can 
avoid any of the fsync guard stuff entirely.

The DBObjectMap I'm very fuzzy on, so I suspect that's where the tricky 
part will be.  Maybe the temp object name includes the intended hobject_t 
in it somewhere, or something, so that the rename can be reflected 
in leveldb at the end.

Thoughts?  Maybe we can do a quick hangout this afternoon to make sure 
this will work before I start putting it together...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to