Re: [Gluster-devel] Wrong behavior on fsync of md-cache ?

Xavier Hernandez Tue, 25 Nov 2014 00:35:50 -0800

On 11/25/2014 07:38 AM, Raghavendra Gowdappa wrote:

----- Original Message -----

From: "Xavier Hernandez" <xhernan...@datalab.es>
To: "Raghavendra Gowdappa" <rgowd...@redhat.com>
Cc: "Gluster Devel" <gluster-devel@gluster.org>, "Emmanuel Dreyfus" 
<m...@netbsd.org>
Sent: Tuesday, November 25, 2014 12:49:03 AM
Subject: Re: Wrong behavior on fsync of md-cache ?


I think the problem is here: the first thing wb_fsync()
checks is if there's an error in the fd (wd_fd_err()). If that's the
case, the call is immediately unwinded with that error. The error seems
to be set in wb_fulfill_cbk(). I don't know the internals of write-back
xlator, but this seems to be the problem.


Yes, your analysis is correct. Once the error is hit, fsync is not
queued  behind unfulfilled writes. Whether it can be considered as a bug
is debatable.  Since there is already an error in one of the writes which
was written-behind  fsync should return the error. I am not sure whether
it should wait till we try to flush _all_ the writes that were written
behind. Any suggestions on what is the expected behaviour here?

I think that it should wait for all pending writes. In the test case Iused, all pending writes will fail the same way that the first one, butin other situations it's possible to have a write failing (for exampledue to a damaged block in disk) and following writes succeeding.


From the man page of fsync:

    fsync() transfers ("flushes") all modified in-core data of (i.e.,
    modified buffer cache pages for) the file referred to by the file
    descriptor fd to the disk device (or other permanent storage
    device) so that all changed information can be retrieved even after
    the system crashed or was rebooted. This includes writing through
    or flushing a disk cache if present. The call blocks until the
    device reports that the transfer has completed. It also flushes
    metadata information associated with the file (see stat(2)).

As I understand it, when fsync is received all queued writes must besent to the device (regardless if a previous write has failed or not).It also says that the call blocks until the device has finished all theoperations.

However it's not clear to me how to control file consistency becausethis allows some writes to succeed after a failed one. I assume thatcontrolling this is the responsibility of the calling application thatshould issue fsyncs on critical points to guarantee consistency.

Anyway it seems that there's a difference between linux and NetBSDbecause this test only fails on NetBSD. Is it possible that linux's fuseimplementation delays the fsync request until all pending writes havebeen answered ? this would explain why this problem has not manifestedtill now. NetBSD seems to send fsync (probably as the first step of aclose() call) when the first write fails.


Xavi
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Wrong behavior on fsync of md-cache ?

Reply via email to