Re: [Gluster-devel] Eager-lock and nfs graph generation

Pranith Kumar K Tue, 19 Feb 2013 18:08:31 -0800

On 02/20/2013 07:03 AM, Anand Avati wrote:

On Tue, Feb 19, 2013 at 5:12 PM, Anand Avati <anand.av...@gmail.com<mailto:anand.av...@gmail.com>> wrote:




    On Tue, Feb 19, 2013 at 3:59 AM, Pranith Kumar K
    <pkara...@redhat.com <mailto:pkara...@redhat.com>> wrote:

        On 02/19/2013 11:26 AM, Anand Avati wrote:


        Thinking over this, looks like there is a problem!

        Write-behind guarantees: That a second write request arriving
        after the acknowledgement of a first overlapping request
        (whether written-behind or otherwise) will be guaranteed to
        be fulfilled in the backend in the same order (i.e, the
        second overlapping request will be "serialized" behind the
        first one in the fulfillment process)

        Eager-lock requirement: That write-behind will send no two
        write requests on an overlapping region at the same time.

        The requirement-set and guarantee-set have a big overlap, but
        the requirement-set is not a subset.

        This is because of O_SYNC writes. write-behind performs
        write-serialization at fulfillment only for written behind
        requests (which get covered under the conflict detection code
        during liability fulfillment). However, if two threads (or
        apps) issue overlapping O_SYNC writes to the same region at
        approx same time, then write-behind will let both of them go
        by without any kind of serialization, into eager lock,
        violating the assumptions!

        I'm wondering if it is a safer idea to implement overlap
        checks within eager-lock code itself rather than depend on
        write-behind :|

        Avati


        On Mon, Feb 11, 2013 at 10:07 PM, Anand Avati
        <anand.av...@gmail.com <mailto:anand.av...@gmail.com>> wrote:



            On Mon, Feb 11, 2013 at 9:32 PM, Pranith Kumar K
            <pkara...@redhat.com <mailto:pkara...@redhat.com>> wrote:

                hi,
                Please note that this is a case in theory and I did
                not run into such situation, but I feel it is
                important to address this.
                Configuration with 'Eager-lock on" and "write-behind
                off" should not be allowed as it leads to lock
                synchronization problems which lead to data
                in-consistency among replicas in nfs.
                lets say bricks b1, b2 are in replication.
                Gluster Nfs server uses 1 anonymous fd to perform all
                write-fops. If eager-lock is enabled in afr, the
                lock-owner is used as fd's address which will be same
                for all write-fops, so there will never be any
                inodelk contention. If write-behind is disabled,
                there can be writes that overlap. (Does nfs makes
                sure that the ranges don't overlap?)

                Now imagine the following scenario:
                lets say w1, w2 are 2 write fops on same offset and
                length. w1 with all '0's and w2 with all '1's. If
                these 2 write fops are executed in 2 different
                threads, the order of arrival of write fops on b1 can
                be w1, w2 where as on b2 it is w2, w1 leading to data
                inconsistency between the two replicas. The lock
                contention will not happen as both lk-owner,
                transport are same for these 2 fops.


            Write-behind has to functions - a) performing operations
            in the background and b) serializing overlapping operations.

            While the problem does exist, the specifics are different
            from what you describe. since all writes coming in from
            NFS will always use the same anonymous FD, two
            near-in-time/overlapping writes will never contend with
            inodelk() but instead the second write will inherit the
            lock and changelog from the first. In either case, it is
            a problem.

                We can add a check in glusterd for volume set to
                disallow such configuration, BUT by default
                write-behind is off in nfs graph and by default
                eager-lock is on. So we should either turn on
                write-behind for nfs or turn off eager-lock by default.

                Could you please suggest how to proceed with this if
                you agree that I did not miss any important detail
                that makes this theory invalid.


            It seems loading write-behind xlator in NFS graph  looks
            like a simpler solution. eager-locking is crucial for
            replicated NFS write performance.

            Avati

        Shall we disable eager-lock for files opened with O_SYNC, for now?


    Bad news: the problem is slightly worse than just this. Even with
    non-O_SYNC writes, there is a possibility in write-behind where,
    if a second overlapping write request comes so close to the first
    request that, if wb_enqueue() of the second one happens after
    wb_enqueue() of the first write, but before any unwind() after the
    first wb_enqueue() (i.e wb_inode->gen is not bumped), then the two
    write requests can be wound down together to eager lock.

But this has a simple fix - http://review.gluster.org/4550. Disablingeager-locking for O_SYNC files is a bad idea. We absolutely wanteager-locking for O_SYNC files. Thinking more..


Avati

Why is disabling eager-lock for O_SYNC files a bad idea? It isacceptable to sacrifice a bit of performance for O_SYNC isn't it?


Pranith.

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Eager-lock and nfs graph generation

Reply via email to