On Fri, Sep 4, 2015 at 2:35 PM, Prashanth Pai <[email protected]> wrote:
> > ----- Original Message ----- > > From: "Raghavendra Gowdappa" <[email protected]> > > To: [email protected] > > Cc: [email protected] > > Sent: Friday, September 4, 2015 12:43:09 PM > > Subject: [Gluster-devel] [posix-compliance] unlink and access to file > through open fd > > > > All, > > > > Posix allows access to file through open fds even if name associated with > > file is deleted. While this works for glusterfs for most of the cases, > there > > are some corner cases where we fail. > > > > 1. Reboot of brick: > > =================== > > > > With the reboot of brick, fd is lost. unlink would've deleted both gfid > and > > path links to file and we would loose the file. As a solution, perhaps we > > should create an hardlink to the file (say in .glusterfs) which gets > deleted > > only when last fd is closed? > > > > 2. Graph switch: > > ================= > > > > The issue is captured in bz 1259995 [1]. Pasting the content from bz > > verbatim: > > Consider following sequence of operations: > > 1. fd = open ("/mnt/glusterfs/file"); > > 2. unlink ("/mnt/glusterfs/file"); > > 3. Do a graph-switch, lets say by adding a new brick to volume. > > 4. migration of fd to new graph fails. This is because as part of > migration > > we do a lookup and open. But, lookup fails as file is already deleted and > > hence migration fails and fd is marked bad. > > > > In fact this test case is already present in our regression tests, > though the > > test checks whether the fd is just marked as bad. But the expectation of > > filing this bug is that migration should succeed. This is possible since > > there is an fd opened on brick through old-graph and hence can be duped > > using dup syscall. > > > > Of course the solution outlined here doesn't cover the case where file > is not > > present on brick at all. For eg., a new brick was added to replica set > and > > that new brick doesn't contain the file. Now, since the file is deleted, > how > > do replica heals that file to another brick etc. > > > > But atleast this can be solved for those cases where file was present on > a > > brick and fd was already opened. > > > > 3. Open-behind and unlink from a different client: > > ================================================== > > > > While open-behind handles unlink from the same client (through which > open was > > performed), if unlink and open are done from two different clients, file > is > > lost. I cannot think of any good solution for this. > > We *may* have hit this once earlier when we had multiple instances of > object-expirer daemon deleting huge number of objects (files). > This was only observed at scale - deleting a million objects. Our > user-space application flow was roughly as follows: > > fd = open(...) > s = stat(fd) > fgetxattr(fd, ....) > > In our case, open() and stat() succeeded but fgetxattr() failed with > ENOENT (many times with ESTALE too) probably because some other client > has done an unlink() on the file name already. Is this behavior normal ? > Its possible (may not be normal, since we are being non-posix complaint here :)). 1. Open might've been serviced by open-behind (faking it). 2. fstat might've been served from md-cache (If it had hit open-behind, it would've done an open before fstat is completed). 3. fgetxattr, if it hits open-behind and file is already deleted from some other client, fgetxattr will fail with ESTALE (not ENOENT, since open is done on gfid and if gfid cannot be looked-up, server-resolver sends out ESTALE). > @Thiago: Remember this one? > http://paste.openstack.org/show/357414/ > https://gist.github.com/thiagodasilva/491e405a3385f0e85cc9 > > > > > I wanted to know whether these problems are real enough to channel our > > efforts to fix these issues. Comments are welcome in terms of solutions > or > > other possible scenarios which can lead to this issue. > > > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1259995 > > > > regards, > > Raghavendra. > > _______________________________________________ > > Gluster-devel mailing list > > [email protected] > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > _______________________________________________ > Gluster-devel mailing list > [email protected] > http://www.gluster.org/mailman/listinfo/gluster-devel > -- Raghavendra G
_______________________________________________ Gluster-devel mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-devel
