Re: [Gluster-devel] handling open fds and graph switches

Raghavendra Bhat Tue, 06 Aug 2013 22:55:02 -0700

On 08/07/2013 05:11 AM, Raghavendra G wrote:

On Wed, Aug 7, 2013 at 12:21 AM, Raghavendra Bhat <rab...@redhat.com<mailto:rab...@redhat.com>> wrote:


    On 08/06/2013 05:22 PM, Raghavendra Gowdappa wrote:


        ----- Original Message -----

            From: "Raghavendra Bhat" <rab...@redhat.com
            <mailto:rab...@redhat.com>>
            To: gluster-devel@nongnu.org <mailto:gluster-devel@nongnu.org>
            Sent: Tuesday, August 6, 2013 1:52:40 PM
            Subject: [Gluster-devel] handling open fds and graph switches


            Hi,

            As of now, there is a problem when following set of
            operations are
            performed on a file.

            open () => unlink () => do a graph change (not
            reconfigure) => fop on
            the opened fd (may be write)

            In the above set of operations, the fop performed on the
            fd after the
            graph switch fails with EBADFD (which should not happen).
            Its because
            when the file is unlinked (assume there are no other
            hardlinks for the
            file), the gfid handle present in the .glusterfs directory
            of the brick
            is removed. Now when graph change happens, all fds have to
            be migrated
            to the new graph. Before that a nameless lookup will be
            sent on the gfid
            (to build the new inode in the new graph). The nameless
            lookup happens
            on the gfid handle. But since the gfid handle is removed
            upon receiving
            the unlink, nameless lookup fails, thus failing the fd
            migration to the
            new graph and the fops on the fd are also failed.

            A patch has been sent to handle
            this(http://review.gluster.org/#/c/5428/), where the gfid
            handle is
            removed when the last reference to the file is removed
            (i.e upon getting
            the unlink, it also checks whether there are any open fds
            on the inode.
            If so, then the gfid handle is not removed. Its removed
            when release on
            that fd is received). But that approach might lead to gfid
            handle leaks
            (what if glusterfsd crashes upon unlinking the last entry?
            the gfid
            handle might not have been removed if there are open fds.
            And now if
            glusterfsd crashes, then the gfid handle for that file is
            leaked).

            Another approach might be to make posix_lookup do a stat
            on one of the
            fds present on the inode when it has to build a INODE
            HANDLE (which
            happens as part of nameless lookup). The nameless lookup
            suceeds and the
            new inode is looked up in the new graph for the client.
            But after that,
            there are 2 more issues.

            1) After successful completion of the nameless lookup, the
            file has to
            be opened in the new graph. So a syncop_open is sent on
            the new graph
            for the gfid. In posix_open, posix xlator again tries to
            open the file
            using the gfid handle. But since the gfid handle is
            removed, open fails
            and the file is not opened (thus fd migration fails
            again.) We can
            search the list of fds for the inode, find the right fd
            that the fuse
            client is trying to migrate and return that fd. But
            searching the right
            fd is a hard task. (What if a fuse client has opened 2 fds
            with same flags?)

        If there is more than one posix fd (fd opened on backend
        filesystem) with same flags, its not really an issue. For our
        purposes it doesn't make any difference. Within glusterfs
        we'll anyways be using a different fd object (to maintain lock
        state etc). At the posix level all we need is an fd opened
        with correct flags. We can dup one of these (posix) fds and
        associate the duped fd with glusterfs fd object. Please note
        that returning glfs_fd_object (with a reference) won't work
        here, since the glusterfs fd object we are migrating might
        have different lock state than the one having posix fd opened
        with same flags. We need to dup the posix fd and associate
        that fd with a new glusterfs fd object.

    Ok. Will see this method.

            2) Another problem is open-behind. Fuse xlator after
            nameless lookup,
            sends syncop_open to migrate the fds. Once the syncop_open
            is complete
            and fds are migrated, PARENT_DOWN event is sent on the old
            graph and the
            client xlator sends release on all the fds (if the
            previous syncop_open
            is successful, then its safe to send release from old
            graph as the new
            fd would have been migrated to the new graph, with
            corresponding fd
            present in the brick). But before that in syncop_open,
            open-behind might
            have sent success to the fuse without actually winding the
            open call to
            the below xlators. Now fuse gets success for the open,
            sends PARENT_DOWN
            to old graph, which sends release on the fd. Thus even
            though a fd is
            present from application's point of view, there are no
            mechanisms to
            access the file (as the fds and gfid handles have been
            removed already.)

        Introduce a key in xdata "force-open" in open fop and if that
        key is set, make open-behind to not to delay open.

    But the problem is syncop_open () does not send any dictionary (it
    will be NULL). We can make open-behind
    check whether xdata is NULL and if so, consider that open call be
    generated internally (not from application) and wind it to the
    below xlator.

Hmm.. I am not too sure whether we can rely on the interpretation thatxdata being NULL means to force open in open-behind. There definitelyare/will be other use-cases of syncop-open where some mightinadvertently leave xdata NULL. It always helps in terms ofunderstandability, to be explicit on what we want to do. Can't youcreate an xdata in fuse fd migration code and pass that down tosyncop-open?

Whoever calls syncop_open does not send the xdata as the arugement atall. It will be like this.

ret = syncop_open (new_subvol, &loc, flags, newfd);

The syncop framework itself sends the xdata as NULL while winding thecall (making syncop framework allocate a new dict before winding andsend it as an argument also wont work in this case, as fuse wont be ableto set any new key).





            Please provide feedback on the above issues.


            Regards,
            Raghavendra Bhat


            _______________________________________________
            Gluster-devel mailing list
            Gluster-devel@nongnu.org <mailto:Gluster-devel@nongnu.org>
            https://lists.nongnu.org/mailman/listinfo/gluster-devel



    _______________________________________________
    Gluster-devel mailing list
    Gluster-devel@nongnu.org <mailto:Gluster-devel@nongnu.org>
    https://lists.nongnu.org/mailman/listinfo/gluster-devel




--
Raghavendra G

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] handling open fds and graph switches

Reply via email to