Please check http://review.gluster.org/4551. This should fix all the known write-behind/eager-lock interaction gaps. On top of this patch, you can now set a bit in the 'flags' of writev fop coming out of write-behind, and look for it in AFR to be sure that you have the 'protection layer' of write-behind offering coverage against concurrent writes. With this you can actually eliminate all the glusterd/volgen crud of implementing dependencies between the two options.
Avati On Tue, Feb 19, 2013 at 7:20 PM, Anand Avati <anand.av...@gmail.com> wrote: > > > On Tue, Feb 19, 2013 at 6:11 PM, Pranith Kumar K <pkara...@redhat.com>wrote: > >> On 02/20/2013 07:03 AM, Anand Avati wrote: >> >> >> >> On Tue, Feb 19, 2013 at 5:12 PM, Anand Avati <anand.av...@gmail.com>wrote: >> >>> >>> >>> On Tue, Feb 19, 2013 at 3:59 AM, Pranith Kumar K >>> <pkara...@redhat.com>wrote: >>> >>>> On 02/19/2013 11:26 AM, Anand Avati wrote: >>>> >>>> Thinking over this, looks like there is a problem! >>>> >>>> Write-behind guarantees: That a second write request arriving after the >>>> acknowledgement of a first overlapping request (whether written-behind or >>>> otherwise) will be guaranteed to be fulfilled in the backend in the same >>>> order (i.e, the second overlapping request will be "serialized" behind the >>>> first one in the fulfillment process) >>>> >>>> Eager-lock requirement: That write-behind will send no two write >>>> requests on an overlapping region at the same time. >>>> >>>> The requirement-set and guarantee-set have a big overlap, but the >>>> requirement-set is not a subset. >>>> >>>> This is because of O_SYNC writes. write-behind performs >>>> write-serialization at fulfillment only for written behind requests (which >>>> get covered under the conflict detection code during liability >>>> fulfillment). However, if two threads (or apps) issue overlapping O_SYNC >>>> writes to the same region at approx same time, then write-behind will let >>>> both of them go by without any kind of serialization, into eager lock, >>>> violating the assumptions! >>>> >>>> I'm wondering if it is a safer idea to implement overlap checks within >>>> eager-lock code itself rather than depend on write-behind :| >>>> >>>> Avati >>>> >>>> On Mon, Feb 11, 2013 at 10:07 PM, Anand Avati <anand.av...@gmail.com>wrote: >>>> >>>>> >>>>> >>>>> On Mon, Feb 11, 2013 at 9:32 PM, Pranith Kumar K <pkara...@redhat.com >>>>> > wrote: >>>>> >>>>>> hi, >>>>>> Please note that this is a case in theory and I did not run into such >>>>>> situation, but I feel it is important to address this. >>>>>> Configuration with 'Eager-lock on" and "write-behind off" should not >>>>>> be allowed as it leads to lock synchronization problems which lead to >>>>>> data >>>>>> in-consistency among replicas in nfs. >>>>>> lets say bricks b1, b2 are in replication. >>>>>> Gluster Nfs server uses 1 anonymous fd to perform all write-fops. If >>>>>> eager-lock is enabled in afr, the lock-owner is used as fd's address >>>>>> which >>>>>> will be same for all write-fops, so there will never be any inodelk >>>>>> contention. If write-behind is disabled, there can be writes that >>>>>> overlap. >>>>>> (Does nfs makes sure that the ranges don't overlap?) >>>>>> >>>>>> Now imagine the following scenario: >>>>>> lets say w1, w2 are 2 write fops on same offset and length. w1 with >>>>>> all '0's and w2 with all '1's. If these 2 write fops are executed in 2 >>>>>> different threads, the order of arrival of write fops on b1 can be w1, w2 >>>>>> where as on b2 it is w2, w1 leading to data inconsistency between the two >>>>>> replicas. The lock contention will not happen as both lk-owner, transport >>>>>> are same for these 2 fops. >>>>>> >>>>> >>>>> Write-behind has to functions - a) performing operations in the >>>>> background and b) serializing overlapping operations. >>>>> >>>>> While the problem does exist, the specifics are different from what >>>>> you describe. since all writes coming in from NFS will always use the same >>>>> anonymous FD, two near-in-time/overlapping writes will never contend with >>>>> inodelk() but instead the second write will inherit the lock and changelog >>>>> from the first. In either case, it is a problem. >>>>> >>>>> >>>>>> We can add a check in glusterd for volume set to disallow such >>>>>> configuration, BUT by default write-behind is off in nfs graph and by >>>>>> default eager-lock is on. So we should either turn on write-behind for >>>>>> nfs >>>>>> or turn off eager-lock by default. >>>>>> >>>>>> Could you please suggest how to proceed with this if you agree that I >>>>>> did not miss any important detail that makes this theory invalid. >>>>>> >>>>> >>>>> It seems loading write-behind xlator in NFS graph looks like a >>>>> simpler solution. eager-locking is crucial for replicated NFS write >>>>> performance. >>>>> >>>>> Avati >>>>> >>>> >>>> Shall we disable eager-lock for files opened with O_SYNC, for now? >>>> >>> >>> Bad news: the problem is slightly worse than just this. Even with >>> non-O_SYNC writes, there is a possibility in write-behind where, if a >>> second overlapping write request comes so close to the first request that, >>> if wb_enqueue() of the second one happens after wb_enqueue() of the first >>> write, but before any unwind() after the first wb_enqueue() (i.e >>> wb_inode->gen is not bumped), then the two write requests can be wound down >>> together to eager lock. >>> >>> >> But this has a simple fix - http://review.gluster.org/4550. Disabling >> eager-locking for O_SYNC files is a bad idea. We absolutely want >> eager-locking for O_SYNC files. Thinking more.. >> >> Avati >> >> Why is disabling eager-lock for O_SYNC files a bad idea? It is acceptable >> to sacrifice a bit of performance for O_SYNC isn't it? >> > > s/bit/quite a bit/. For O_SYNC writes, eager locking is the only saving > grace in performance as write-behind stays out of the way completely. We > would need overlap checks either in AFR or write-behind for O_SYNC writes. > > Avati >
_______________________________________________ Gluster-devel mailing list Gluster-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/gluster-devel