In my cursory examination of CephFS, it appears that the offset is some kind of sequential count stored in the dentrys. This means the offset for a given dirent can likely change, as the directory mutates. It also seems to imply that the offset actually points to the next entry, whatever that is at the time of lookup, and has no real relation to the current entry.
Daniel On 03/21/2017 08:09 PM, Frank Filz wrote: > I am having challenges getting dirent chunking to work correctly in all > scenarios... > >>From the client side we have the following requirements: > > NFS client will send a READDIR request with a whence that may be non-zero > NFS client is returned entries, each entry has a "cookie" that may be used > as whence on a subsequent READDIR to start fetching entries starting with > the entry following the one the cookie was associated with > 9P seems to have the same requirements > > The above matches well with lseek and getdents (which is what FSAL_VFS uses) > > For FSAL_RGW, we would like the cookie to be the "address" of the entry > rather than the next entry > Which allows us to compute the cookie for an inserted dirent (from lookup, > create, link, or rename) > > For continuing to read a directory having read some number of chunks in, we > would like to use a whence that will find the next directory entry after the > last one in the previous chunk. > > Now here is one problem, for FSAL_VFS if we use the d_off as the cookie, > that is actually the "address" of the next entry AT THAT TIME. That means > that if we do a lseek to the last cookie in a chunk, we may NOT find the > actual next entry. There may also be an issue due to . and .. sorting > somewhere in the middle of the directory (at least on my ext4 filesystem, > the "address" of . is always 0x4c470ee8300a65ab (which means that will be > the d_off for whichever entry precedes .) and .. is always > 0x68ec4bc2e1982399. > > If we aren't trying to insert dirents, that may be ok. If so, we can > probably live with RGW cookies being the address of the entry while VFS > cookie are the address of the current next entry, and so long as those FSALs > which return cookie as the address of the entry, do indeed provide the NEXT > entry when we provide that cookie as whence on readdir, everything should > work. > > But I'm also trying to test the dirent insert using FSAL_VFS, and it isn't > working... > > The problem is an insert that becomes the new first directory entry, or an > insert that slips in just before the . or .. entries. > > In order to make a workable ability to insert dirents, FSAL_VFS readdir > COULD return the previous cookie as the cookie for an entry. In that case, > after doing an lseek, it would just have to skip the first entry. For ext4 > it MIGHT work to actually lseek to whence+1... > > FSAL_VFS compute_readdir_cookie would of course just return the d_off from > the entry prior to finding the named entry. > > Then one problem remains for FSAL_VFS. We can't get the actual "address" of > the very first dirent. This could be handled by the following mechanism: > > If we insert a new dirent, and compute_readdir_cookie returns 0 for it, we > must then call compute_readdir_cookie on the previous first entry (which > will return it's actual address now that it no longer is the first entry in > the directory), and move it in the AVL tree so we can now insert the new 0. > > It would really help to understand how Gluster and Ceph readdir with a > non-zero whence actually works, how do your cookies work? > > How do you feel about chunking possibly missing new entries in a directory > really is. Note that if we decide our current attributes are invalid, > refresh them, and detect mtime changes, then we will flush the dirents, so > this MAY not be that much of an issue. On the other hand, it also means that > even if we dump the dirent cache, a client that doesn't give up, and sends a > non-zero whence may miss entries that folks feel it should have found. > > Thanks > > Frank > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Nfs-ganesha-devel mailing list > Nfs-ganesha-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel