Adding in Eric and Steve. Ric
On Oct 18, 2017 9:35 AM, "Raghavendra G" <raghaven...@gluster.com> wrote: +Brian Foster On Wed, Oct 11, 2017 at 4:11 PM, Raghavendra G <raghaven...@gluster.com> wrote: > We ran into a regression [2][3]. Hence reviving this thread. > > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1500269 > [3] https://review.gluster.org/18463 > > On Thu, Mar 31, 2016 at 1:22 AM, J. Bruce Fields <bfie...@fieldses.org> > wrote: > >> On Mon, Mar 28, 2016 at 04:21:00PM -0400, Vijay Bellur wrote: >> > On 03/28/2016 09:34 AM, FNU Raghavendra Manjunath wrote: >> > > >> > >I can understand the concern. But I think instead of generally >> > >converting all the ESTALE errors ENOENT, probably we should try to >> > >analyze the errors that are generated by lower layers (like posix). >> > > >> > >Even fuse kernel module some times returns ESTALE. (Well, I can see it >> > >returning ESTALE errors in some cases in the code. Someone please >> > >correct me if thats not the case). And aso I am not sure if converting >> > >all the ESTALE errors to ENOENT is ok or not. >> > >> > ESTALE in fuse is returned only for export_operations. fuse >> > implements this for providing support to export fuse mounts as nfs >> > exports. A cursory reading of the source seems to indicate that fuse >> > returns ESTALE only in cases where filehandle resolution fails. >> > >> > > >> > >For fd based operations, I am not sure if ENOENT can be sent or not (as >> > >though the file is unlinked, it can be accessed if there were open fds >> > >on it before unlinking the file). >> > >> > ESTALE should be fine for fd based operations. It would be analogous >> > to a filehandle resolution failing and should not be a common >> > occurrence. >> > >> > > >> > >I feel, we have to look into some parts to check if they generating >> > >ESTALE is a proper error or not. Also, if there is any bug in below >> > >layers fixing which can avoid ESTALE errors, then I feel that would be >> > >the better option. >> > > >> > >> > I would prefer to: >> > >> > 1. Return ENOENT for all system calls that operate on a path. >> > >> > 2. ESTALE might be ok for file descriptor based operations. >> >> Note that operations which operate on paths can fail with ESTALE when >> they attempt to look up a component within a directory that no longer >> exists. >> > > But, "man 2 rmdir" or "man 2 unlink" doesn't list ESTALE as a valid > error. Also rm doesn't seem to handle ESTALE too [3] > > [4] https://github.com/coreutils/coreutils/blob/master/src/remove.c#L305 > > >> Maybe non-creating open("./foo") returning ENOENT would be reasonable in >> this case since that's what you'd get in the local filesystem case, but >> creat("./foo") returning ENOENT, for example, isn't something >> applications will be written to handle. >> >> The Linux VFS will retry ESTALE on path-based systemcalls *one* time, to >> reduce the chance of ESTALE in those cases. > > > I should've anticipated bug [2] due to this comment. My mistake. Bug [2] > is indeed due to kernel not retrying open on receiving an ENOENT error. > Glusterfs sent ENOENT because file's inode-number/nodeid changed but same > path exists. The correct error would've been ESTALE, but due to our > conversion of ESTALE to ENOENT, the latter was sent back to kernel. > We've an application which does very frequent renames (around 10-15 per second). So, single retry by kernel of an open failed with ESTALE is not helping us. @Bruce/Brian, Do you know why VFS chose an approach of retrying instead of a stricter synchronization mechanism using locking? For eg., rename and open could've been synchronized using a lock. For eg., a rough psuedocode for open and rename could've been (please ignore ordering src,dst locks in rename) sys_open () { LOCK (dentry->lock); { lookup path; open (inode) } UNLOCK (dentry->lock; } sys_rename () { LOCK (dst-dentry->lock); { LOCK (src-dentry->lock); { rename (src, dst); } UNLOCK (src-dentry->lock); } UNLOCK (dst-dentry->lock); } @Bruce, With the current retry model, any suggestions on how to handle applications that do frequent renames? > Looking through kernel VFS code, only open *seems* to retry > (do_filep_open). I couldn't find similar logic to other path based syscalls > like rmdir, unlink, stat, chmod etc > > The bugzilla entry that >> tracked those patches might be interesting: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=678544 >> >> > NFS recommends that applications add special code for handling >> > ESTALE [1]. Unfortunately changing application code is not easy and >> > hence it does not come as a surprise that coreutils also does not >> > accommodate ESTALE. >> >> We also need to consider whether the application's handling of the >> ENOENT case could be incorrect for the ESTALE case, with consequences >> possibly as bad as or worse than consequences of seeing an unexpected >> error. >> >> My first intuition is that translating ESTALE to ENOENT is less safe >> than not doing so, because: >> >> - once an ESTALE-unaware application his the ESTALE case, we >> risk a bug regardless of which we return, but if we return >> ESTALE at least the problem should be more obvious to the >> person debugging. >> - for ESTALE-aware applications, the ESTALE/ENOENT distinction >> is useful. >> > > Another place to not convert is for those cases where kernel retries the > operation on seeing an ESTALE. > > I guess we need to think through each operation and we cannot ESTALE to > ENOENT always. > > >> But I haven't really thought through examples. >> >> --b. >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel >> > > > > -- > Raghavendra G > regards, -- Raghavendra G _______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel