Re: [Gluster-devel] RFC/Review: libgfapi object handle based extensions

Shyamsundar Ranganathan Mon, 30 Sep 2013 03:41:12 -0700

Avati, Amar, 

Amar, Anand S and myself had a discussion on this comment and here is an answer 
to your queries the way I see it. Let me know if I am missing something here.

(this is not a NFS Ganesha requirement, FYI. As Ganesha will only do a single 
lookup or preserve a single object handle per filesystem object in its cache) 

Currently a glfs_object is an opaque pointer to an object (it is a _handle_ to 
the object). The object itself contains a ref'd inode, which is the actual 
pointer to the object. 

1) The similarity and differences of object handles to fds 

The intention of multiple object handles is in lines with multiple fd's per 
file, an application using the library is free to lookup (and/or create (and 
its equivalents)) and acquire as many object handles as it wants for a 
particular object, and can hence determine the lifetime of each such object in 
its view. So in essence one thread can have an object handle to perform, say 
attribute related operations, whereas another thread has the same object looked 
up to perform IO. 

Where the object handles depart from the notion of fds is when an unlink is 
performed. As POSIX defines that open fds are still _open_ for activities on 
the file, the life of an fd and the actual object that it points to is till the 
fd is closed. In the case of object handles though, the moment any handle is 
used to unlink the object (which BTW is done using the parent object handle and 
the name of the child), all handles pointing to the object are still valid 
pointers, but operations on then will result in ENOENT, as the actual object 
has since been unlinked and removed by the underlying filesystem. 

The departure from fds is considered valid in my perspective, as the handle 
points to an object, which has since been removed, and so there is no semantics 
here that needs it to be preserved for further operations as there is a 
reference to it held. 

So in essence for each time an object handle is returned by the API, it has to 
be closed for its life to end. Additionally if the object that it points to is 
removed from the underlying system, the handle is pointing to an entry that 
does not exist any longer and returns ENOENT on operations using the same. 

2) The issue/benefit of having the same object handle irrespective of looking 
it up multiple times 

If we have an 1-1 relationship of object handles (i.e struct glfs_object) to 
inodes, then the caller gets the same pointer to the handle. Hence having 
multiple handles as per the caller, boils down to giving out ref counted 
glfs_object(s) for the same inode. 

Other than the memory footprint, this will still not make the object live past 
it's unlink time. The pointer handed out will be still valid till the last ref 
count is removed (i.e the object handle closed), at which point the object 
handle can be destroyed. 

So again, as many handles were handed out for the same inode, they have to be 
closed, etc. 

3) Graph switches 

In the case of graph switches, handles that are used in operations post the 
switch, get refreshed with an inode from the new graph, if we have an N:1 
object to inode relationship. 

In the case of 1:1 this is done once, but is there some multi thread safety 
that needs to be in place? I think this is already in place from the 
glfs_resolve_inode implementation as suggested earlier, but good to check. 

4) Renames 

In the case of renames, the inode remains the same, hence all handed out object 
handles still are valid and will operate on the right object per se. 

5) unlinks and recreation of the same _named_ object in the background 

Example being, application gets an handle for an object, say named "a.txt", and 
in the background (or via another application/client) this is deleted and 
recreated. 

This will return ENOENT as the GFID would have changed for the previously held 
object to the new one, even though the names are the same. This seems like the 
right behaviour, and does not change in the case of a 1:1 of an N:1 object 
handle to inode mapping. 

So bottom line, I see the object handles like an fd with the noted difference 
above. Having them in a 1:1 relationship or as a N:1 relationship does not seem 
to be an issue from what I understand, what am I missing here? 

Shyam 

----- Original Message -----

> From: "Anand Avati" <av...@gluster.org>
> To: "Shyamsundar Ranganathan" <srang...@redhat.com>
> Cc: "Gluster Devel" <gluster-devel@nongnu.org>
> Sent: Monday, September 30, 2013 10:35:05 AM
> Subject: Re: RFC/Review: libgfapi object handle based extensions

> I see a pretty core issue - lifecycle management of 'struct glfs_object'.
> What is the structure representing? When is it created? When is it
> destroyed? How does it relate to inode_t?

> Looks like for every lookup() we are creating a new glfs_object, even if the
> looked up inode was already looked up before (in the cache) and had a
> glfs_object created for it in the recent past.

> We need a stronger relationship between the two with a clearer relationship.
> It is probably necessary for a glfs_object to represent mulitple inode_t's
> at different points in time depending on graph switches, but for a given
> inode_t we need only one glfs_object. We definitely must NOT have a new
> glfs_object per lookup call.

> Avati

> On Thu, Sep 19, 2013 at 5:13 AM, Shyamsundar Ranganathan <
> srang...@redhat.com > wrote:

> > Avati,
> 

> > Please find the updated patch set for review at gerrit.
> 
> > http://review.gluster.org/#/c/5936/
> 

> > Changes made to address the points (1) (2) and (3) below. By the usage of
> > the
> > suggested glfs_resolve_inode approach.
> 

> > I have not yet changes glfs_h_unlink to use the glfs_resolve_at. (more on
> > this a little later).
> 

> > So currently, the review request is for all APIs other than,
> 
> > glfs_h_unlink, glfs_h_extract_gfid, glfs_h_create_from_gfid
> 

> > glfs_resolve_at: Using this function the terminal name will be a force look
> > up anyway (as force_lookup will be passed as 1 based on !next_component).
> > We
> > need to avoid this _extra_ lookup in the unlink case, which is why all the
> > inode_grep(s) etc. were added to the glfs_h_lookup in the first place.
> 

> > Having said the above, we should still leverage glfs_resolve_at anyway, as
> > there seem to be other corner cases where the resolved inode and subvol
> > maybe from different graphs. So I think I want to modify glfs_resolve_at to
> > make a conditional force_lookup, based on iatt being NULL or not. IOW,
> > change the call to glfs_resolve_component with the conditional as, (reval
> > ||
> > (!next_component && iatt)). So that callers that do not want the iatt
> > filled, can skip the syncop_lookup.
> 

> > Request comments on the glfs_resolve_at proposal.
> 

> > Shyam.
> 

> > ----- Original Message -----
> 

> > > From: "Anand Avati" < av...@gluster.org >
> 
> > > To: "Shyamsundar Ranganathan" < srang...@redhat.com >
> 
> > > Cc: "Gluster Devel" < gluster-devel@nongnu.org >
> 
> > > Sent: Wednesday, September 18, 2013 11:39:27 AM
> 
> > > Subject: Re: RFC/Review: libgfapi object handle based extensions
> 

> > > Minor comments are made in gerrit. Here is a larger (more important)
> > > comment
> 
> > > for which email is probably more convenient.
> 

> > > There is a problem in the general pattern of the fops, for example
> 
> > > glfs_h_setattrs() (and others too)
> 

> > > 1. glfs_validate_inode() has the assumption that object->inode deref is a
> 
> > > guarded operation, but here we are doing an unguarded deref in the
> > > paramter
> 
> > > glfs_resolve_base().
> 

> > > 2. A more important issue, glfs_active_subvol() and glfs_validate_inode()
> > > are
> 
> > > not atomic. glfs_active_subvol() can return an xlator from one graph, but
> > > by
> 
> > > the time glfs_validate_inode() is called, a graph switch could have
> > > happened
> 
> > > and inode can get resolved to a different graph. And in syncop_XXXXXX()
> > > we
> 
> > > end up calling on graph1 with inode belonging to graph2.
> 

> > > 3. ESTALE_RETRY is a fundamentally wrong thing to do with handle based
> 
> > > operations. The ESTALE_RETRY macro exists for path based FOPs where the
> 
> > > resolved handle could have turned stale by the time we perform the FOP
> 
> > > (where resolution and FOP are non-atomic). Over here, the handle is
> 
> > > predetermined, and it does not make sense to retry on ESTALE (notice that
> > > FD
> 
> > > based fops in glfs-fops.c also do not have ESTALE_RETRY for this same
> 
> > > reason)
> 

> > > I think the pattern should be similar to FD based fops which specifically
> 
> > > address both the above problems. Here's an outline:
> 

> > > glfs_h_XXXX(struct glfs *fs, glfs_object *object, ...)
> 
> > > {
> 
> > > xlator_t *subvol = NULL;
> 
> > > inode_t *inode = NULL;
> 

> > > __glfs_entry_fs (fs);
> 

> > > subvol = glfs_active_subvol (fs);
> 
> > > if (!subvol) { errno = EIO; ... goto out; }
> 

> > > inode = glfs_resolve_inode (fs, object, subvol);
> 
> > > if (!inode) { errno = ESTALE; ... goto out; }
> 

> > > loc.inode = inode;
> 
> > > ret = syncop_XXXX(subvol, &loc, ...);
> 

> > > }
> 

> > > Notice the signature of glfs_resolve_inode(). What it does: given a
> 
> > > glfs_object, and a subvol, it returns an inode_t which is resolved on
> > > that
> 
> > > subvol. This way the syncop_XXX() is performed with matching subvol and
> 
> > > inode. Also it returns the inode pointer so that no unsafe object->inode
> 
> > > deref is done by the caller. Again, this is the same pattern followed by
> > > the
> 
> > > fd based fops already.
> 

> > > Also, as mentioned in one of the comments, please consider using
> 
> > > glfs_resolve_at() and avoiding manual construction of loc_t.
> 

> > > Thanks,
> 
> > > Avati
>

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] RFC/Review: libgfapi object handle based extensions

Reply via email to