On 4/28/15 4:03 AM, Rick Macklem wrote:
Julian Elischer wrote:
On 4/25/15 4:28 AM, John Baldwin wrote:
On Saturday, April 25, 2015 02:36:24 AM Julian Elischer wrote:
On 4/25/15 1:30 AM, Julian Elischer wrote:
On 4/24/15 10:59 PM, John Baldwin wrote:
Index: head/lib/libc/gen/telldir.c
--- head/lib/libc/gen/telldir.c (revision 281929)
+++ head/lib/libc/gen/telldir.c (working copy)
@@ -101,8 +101,10 @@
           if (lp->loc_loc == dirp->dd_loc && lp->loc_seek ==
-       (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek,
-       dirp->dd_seek = lp->loc_seek;
+       if (lp->loc_seek != dirp->dd_seek) {
+               (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek,
+               dirp->dd_seek = lp->loc_seek;
+       }
yes I did that yesterday but it still fails when you transition
blocks.. (badly).

I also tried bigger blocks.. also fails (eventually)

I did find a way to make it work...  you had to seek back
to the first block you deleted on each set..
then work forward from there again..  unfortunately since
I'm trying to make a microsoft program not fail (via samba)
I have no control over how it does things and seekdir doesn't
know what was deleted anyway... (so the fix is fine for  the
test program but not for real life)

I think I can make the BSD one act like the linux one by changing
the lseek being done to use the offset (loc) plus the buffer seek
address of the target, instead of just going for the buffer base
stepping forward through the entries..

maybe tomorrow.

The following conditional code makes ours behave the same as the
it breaks several 'rules' but works where ours is clean but
as Rick said..  "maybe that's what we should do too."

this is at the end of seekdir()

The new code does what linux does.. and shouldn't work.. but does
               // at least in the limited conditions I need it to.
               // We'll probably need to do this at work...:

The original code is what we have now, but gets mightily confused
          // This is clean(er) but fails in specific
doing commands
          // from Microft windows, via samba).

root@vps1:/tmp # diff -u dir.c.orig dir.c
--- dir.c.orig    2015-04-24 11:29:36.855317000 -0700
+++ dir.c    2015-04-24 11:15:49.058500000 -0700
@@ -1105,6 +1105,13 @@
            dirp->dd_loc = lp->loc_loc;
+#ifdef GLIBC_SEEK
+    (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek + lp->loc_loc,
+    dirp->dd_seek = lp->loc_seek + lp->loc_loc;
+    dirp->dd_loc = 0;
+    lp->loc_seek = dirp->dd_seek;
+    lp->loc_loc = 0;
        (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET);
        dirp->dd_seek = lp->loc_seek;
        dirp->dd_loc = 0;
@@ -1114,6 +1121,7 @@
            if (dp == NULL)
Yes, this isn't at all safe.  There's no guarantee whatsoever that
the offset on the directory fd that isn't something returned by
getdirentries has any meaning.  In particular, the size of the
directory entry in a random filesystem might be a different size
than the structure returned by getdirentries (since it converts
things into a FS-independent format).

This might work for UFS by accident, but this is probably why ZFS
doesn't work.

However, this might be properly fixed by the thing that ino64 is
doing where each directory entry returned by getdirentries gives
you a seek offset that you _can_ directly seek to (as opposed to
seeking to the start of the block and then walking forward N
entries until you get an inter-block entry that is the same).
I just made the stunning discovery that our seekdir/readdir/telldir
code in libc works with
FreeBSD 8.0.
so maybe the problem is that the kernel changed it's behaviour, and
no-one thought to fix libc..

(at least it works on one of our 8.0 base appliances.. I'll do more
testing tomorrow.. it's past midnight.)

actually that was a mistake.. it fails on 8.0 as much as it fails on 10.x and 11.
the patch above also fixes it on 8.0.  (UFS and ZFS)

I suspect that pre-r252438 systems work better for UFS than r252438
or later. That patch changed ufs_readdir() so that it no longer returned
the on-disk directory structure. (Among other things, it added code that
skipped over d_ino == 0 entries.)
yes but it was broken even before that.
basically here's the difference between what Linux (and mu patched code) does and what we do..

in Linux they seek directly to the base address of the block PLUS the offset of the entry being seeked. The filesystem somehow correctly interprets this as teh start of the correct entry. I guess it must return the correct information in the dirent structure in the first place.

in the unmodified BSD version, we seek to the start of the 'block' (actually the start of the buffer we asked for, which may or may not (probably not) be any special address in the directory (except at 0) and then we step forward records adding up the offsets (lengths) until we get pas thte address we are looking for. Since we do not return empty records, as soon as we get to the place where there ued to be an empty record, we get out of sync with the kernel. It appears that even in 8, deleted records didn't get to userland. The answer would be to pad out earlier records or add pad records at the front of the buffer, but that doesn't seem to be happenning. (it's what USED to be done but I suspect that 254438 was not the only place that was broken.)

As such, r252438 and later systems have UFS where the "logical" offset
of a directory entry returned by getdirentries() isn't the same as the
"physical" offset for it in the on-disk directory.

Having said the above, I have two somewhat inconsistent thoughts:
1 - As jhb has explained, the libc functions aren't safe for telldir()/seekdir()
     when entries are added/deleted. It just happens that UFS might work
     ok (and is more likely to work ok when "logical offset" == "physical 
2 - I'm not sure r252438 was a good idea (at least the part that skips invalid
     d_ino == 0 entries) because I don't think making "logical offset" != "physical 
     is a good idea, if there isn't a good reason to need to do so.

I think it is hard to argue that r252438 broke the libc functions. It just
happens that cases that aren't guaranteed to work happens to work without 

I also think that the use of d_off (or d_cookie, if you prefer that name), which
would be the "physical offset" of the next directory entry is the best bet
for fixing this, in general. (By in general, I mean for all file systems.)
But this will require a new getdirentries(2) syscall and libc functions that
know how to use it.


freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to

freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to