On 4/25/15 4:28 AM, John Baldwin wrote:
On Saturday, April 25, 2015 02:36:24 AM Julian Elischer wrote:
On 4/25/15 1:30 AM, Julian Elischer wrote:
On 4/24/15 10:59 PM, John Baldwin wrote:
On Friday, April 24, 2015 01:02:39 PM Julian Elischer wrote:
On 4/23/15 9:54 PM, John Baldwin wrote:
On Thursday, April 23, 2015 05:02:08 PM Julian Elischer wrote:
On 4/23/15 11:20 AM, Julian Elischer wrote:
I'm debugging a problem being seen with samba 3.6.

basically  telldir/seekdir/readdir don't seem to work as
advertised..
ok so it looks like readdir() (and friends) is totally broken in
the face
of deletes unless you read the entire directory at once or reset
to the
the first file before the deletes, or earlier.
I'm not sure that Samba isn't assuming non-portable behavior.
For example:

>From
http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir_r.html


If a file is removed from or added to the directory after the
most recent call
to opendir() or rewinddir(), whether a subsequent call to
readdir() returns an
entry for that file is unspecified.

While this doesn't speak directly to your case, it does note that
you will
get inconsistencies if you scan a directory concurrent with add
and remove.

UFS might kind of work actually since deletes do not compact the
backing
directory, but I suspect NFS and ZFS would not work.  In
addition, our
current NFS support for seekdir is pretty flaky and can't be
fixed without
changes to return the seek offset for each directory entry (I
believe that
the projects/ino64 patches include this since they are breaking
the ABI of
the relevant structures already).  The ABI breakage makes this a
very
non-trivial task.  However, even if you have that per-item
cookie, it is
likely meaningless in the face of filesystems that use any sort
of more
advanced structure than an array (such as trees, etc.) to store
directory
entries.  POSIX specifically mentions this in the rationale for
seekdir:

One of the perceived problems of implementation is that returning
to a given point in a directory is quite difficult to describe
formally, in spite of its intuitive appeal, when systems that use
B-trees, hashing functions, or other similar mechanisms to order
their directories are considered. The definition of seekdir() and
telldir() does not specify whether, when using these interfaces,
a given directory entry will be seen at all, or more than once.

In fact, given that quote, I would argue that what Samba is doing is
non-portable.  This would seem to indicate that a conforming
seekdir could
just change readdir to immediately return EOF until you call
rewinddir.
how does readdir know that the directory has been changed? fstat?
Undefined.  FreeBSD's libc doesn't cache the entire directory
(unless you
are using a union mount), instead it just caches the most recent
call to
getdirentries().  It then serves up entries from that block until
you hit
the end.  When you hit the end it reads the next block, etc. When you
call telldir(), the kernel saves the seek offset from the start of the
current block and the offset within that block and returns a cookie to
you.  seekdir() looks up the cookie to find the (seek offset, block
offset)
tuple.  If the location matches the directory's current location it
doesn't
do anything, otherwise it seeks to the seek offset and reads a new
block via
getdirentries().  There is no check for seeing if a directory is
changed.  Instead, you can only be stale by one "block".  When you
read
a new block it is relative to the directory's state at that time.

Rick's suggestion of reusing the block for a seek within a block
would be
fairly easy to implement, I think it would just be:

Index: head/lib/libc/gen/telldir.c
===================================================================
--- head/lib/libc/gen/telldir.c (revision 281929)
+++ head/lib/libc/gen/telldir.c (working copy)
@@ -101,8 +101,10 @@
                  return;
          if (lp->loc_loc == dirp->dd_loc && lp->loc_seek ==
dirp->dd_seek)
                  return;
-       (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET);
-       dirp->dd_seek = lp->loc_seek;
+       if (lp->loc_seek != dirp->dd_seek) {
+               (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek,
SEEK_SET);
+               dirp->dd_seek = lp->loc_seek;
+       }
yes I did that yesterday but it still fails when you transition
blocks.. (badly).

I also tried bigger blocks.. also fails (eventually)

I did find a way to make it work...  you had to seek back
to the first block you deleted on each set..
then work forward from there again..  unfortunately since
I'm trying to make a microsoft program not fail (via samba)
I have no control over how it does things and seekdir doesn't
know what was deleted anyway... (so the fix is fine for  the
test program but not for real life)

I think I can make the BSD one act like the linux one by changing
the lseek being done to use the offset (loc) plus the buffer seek
address of the target, instead of just going for the buffer base and
stepping forward through the entries..

maybe tomorrow.

The following conditional code makes ours behave the same as the linux
one.
it breaks several 'rules' but works where ours is clean but fails..
as Rick said..  "maybe that's what we should do too."


this is at the end of seekdir()


The new code does what linux does.. and shouldn't work.. but does
              // at least in the limited conditions I need it to.
              // We'll probably need to do this at work...:


The original code is what we have now, but gets mightily confused
sometimes.
         // This is clean(er) but fails in specific situations(when
doing commands
         // from Microft windows, via samba).


root@vps1:/tmp # diff -u dir.c.orig dir.c
--- dir.c.orig    2015-04-24 11:29:36.855317000 -0700
+++ dir.c    2015-04-24 11:15:49.058500000 -0700
@@ -1105,6 +1105,13 @@
           dirp->dd_loc = lp->loc_loc;
           return;
       }
+#ifdef GLIBC_SEEK
+    (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek + lp->loc_loc,
SEEK_SET);
+    dirp->dd_seek = lp->loc_seek + lp->loc_loc;
+    dirp->dd_loc = 0;
+    lp->loc_seek = dirp->dd_seek;
+    lp->loc_loc = 0;
+#else
       (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET);
       dirp->dd_seek = lp->loc_seek;
       dirp->dd_loc = 0;
@@ -1114,6 +1121,7 @@
           if (dp == NULL)
               break;
       }
+#endif
   }
Yes, this isn't at all safe.  There's no guarantee whatsoever that
the offset on the directory fd that isn't something returned by
getdirentries has any meaning.
the value returned by getdirentries is the original seek for the block, and the offset is not sent obviosly.. but we could do somethign where getdirentries 'remembers' all teh offsets it sent for the last set of entries, of we could make it return a a seekable cookie for every entry...

  In particular, the size of the
directory entry in a random filesystem might be a different size
than the structure returned by getdirentries (since it converts
things into a FS-independent format).

yes I understand all that.. :-)

This might work for UFS by accident, but this is probably why ZFS
doesn't work.
actually ZFS seems to work too which stunned me..

However, this might be properly fixed by the thing that ino64 is
doing where each directory entry returned by getdirentries gives
you a seek offset that you _can_ directly seek to (as opposed to
seeking to the start of the block and then walking forward N
entries until you get an inter-block entry that is the same).

yeah that might be nice.



_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to