On 5/3/15 10:33 PM, Jilles Tjoelker wrote:
On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov wrote:
On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote:
if you are interested in readdir(3), seekdir(3) and telldir(3) then
you should look at
this patches around a problem in seekdir() that breaks Samba.
Seekdir(3) will not work as expected when files prior to the point of
interest in directory have been deleted since the directory was opened.
Windows clients using Samba cause both these things to happen, causing
the next readdir(3) after the bad seekdir(3) to skip some entries and
return the wrong file.
Samba only needs to step back a single directory entry in the case
where it reads an entry and then discovers it can't fit it into the
buffer it is sending to the windows client. It turns out we can
reliably cater to Samba's requirement because the "last returned
element" is always still in memory, so with a little care, we can
set our filepointer back to it safely. (once)
seekdir and readdir (and telldir()) need a complete rewrite along with
getdirentries() but that is more than a small edit like this.
Can you explain your expectations from the whole readdir() vs. parallel
directory modifications interaction ?  From what I understood so far,
there is unlocked modification of the container and parallel iterator
over the same container.  IMO, in such situation, whatever tweaks you
apply to the iterator, it is still cannot be made reliable.
Before making single-purpose changes to the libc readdir and seekdir
code, or to the kernel code, it would be useful to state exact behaviour
of the dirent machinery we want to see. No, 'make samba works in my
situation' does not sound good enough.
Consider the subsequence of entries that existed at opendir() time and
were not removed until now. This subsequence is clearly defined and does
not have concurrency problems. The order of this subsequence must remain
unchanged and seekdir() must be correct with respect to this

Additionally, two other kinds of entries may be returned. New entries
may be inserted anywhere in between the entries of the subsequence, and
removed entries may be returned as if they were still part of the
subsequence (so that not every readdir() needs a system call).

A simple implementation for UFS-style directories is to store the offset
in the directory (all bits of it, not masking off the lower 9 bits).
This needs d_off or similar in struct dirent. The kernel getdirentries()
then needs a similar loop as the old libc seekdir() to go from the start
of the 512-byte directory block to the desired entry (since an entry may
not exist at the stored offset within the directory block).

This means that a UFS-style directory cannot be compacted (existing
entries moved from higher to lower offsets to fill holes) while it is
open for reading. An NFS exported directory is always open for reading.

This also means that duplicate entries can only be returned if that
particular filename was deleted and created again.

Without kernel support, it is hard to get telldir/seekdir completely
reliable. The current libc implementation is wrong since the "holes"
within the block just disappear and change the offsets of the following
entries; the kernel cannot fix this using entries with d_fileno = 0
since it cannot know, in the general case, how long the deleted entry
was in the filesystem-independent dirent format. My previous idea of
storing one d_fileno during telldir() is wrong since it will fail if
that entry is deleted.

If you do not care about memory usage (which probably is already
excessive with the current libc implementation), you could store at
telldir() time the offset of the current block returned by
getdirentries() and the d_fileno of all entries already returned in the
current block.

The D2410 patch can conceptually work for what Samba needs, stepping
back one directory entry. I will comment on it.

how long do I have to wait until I can commit this and was kib's comment a
"do not commit"?

freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to