Yeah, thanks Sage for confirming this.

Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sw...@redhat.com]
Sent: Thursday, September 10, 2015 3:04 PM
To: Somnath Roy
Cc: ceph-devel
Subject: Re: Regarding journal replay

On Thu, 10 Sep 2015, Somnath Roy wrote:
> Sage et. al,
> Could you please let me know what will happen during journal replay in this 
> scenario ?
>
> 1. Say last committed seq is 3 and after that one more independent
> transaction with say 4 came. Transaction seq 4, has say delete xattr,
> delete object, create a new object,  set xattr
>
> 2. Seq 4 is committed in journal and in half way of applying (say all deletes 
> are done , and created new object but set xattr not done) system crashed.
>
> 3. During restart OSD will try to replay seq 4.
>
> Now, my understanding is, it will blindly run the entire transaction again. 
> But..
>
> 1. Delete will fail since the file doesn't exists.
>
> 2. It will create the new object again even if it is already created ,
> probably get an already exist error (?)
>
> Question is, how it will determine the error is because of filesystem 
> corruption or half executed transaction ?
>  I saw in the code we are ignoring these errors during replay , is it correct 
> ?
> Any information on this will be helpful.

There are a subset of operations where errors (or certain errors) are ignored 
(and expected) during replay.  ENOENT on delete is one of them.
The largest set of them is whitelisted here

https://github.com/ceph/ceph/blob/master/src/os/FileStore.cc#L2781

but if you grep for 'replaying' you'll see several other instances elsewhere.

Sadly you can't tell if these are happening because of the timing of the crash 
or because of some other corruption.... the combination of a write-ahead 
transaction log and posix is far from ideal.  In general, though, operation are 
all safe to replay.  In the cases where they are not, there is the replay_guard 
machinery to prevent certain things from being replayed.

sage


________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to