On 4/25/15 9:39 AM, Rick Macklem wrote:
Jilles Tjoelker wrote:
On Fri, Apr 24, 2015 at 04:28:12PM -0400, John Baldwin wrote:
Yes, this isn't at all safe.  There's no guarantee whatsoever that
the offset on the directory fd that isn't something returned by
getdirentries has any meaning.  In particular, the size of the
directory entry in a random filesystem might be a different size
than the structure returned by getdirentries (since it converts
things into a FS-independent format).
This might work for UFS by accident, but this is probably why ZFS
doesn't work.
However, this might be properly fixed by the thing that ino64 is
doing where each directory entry returned by getdirentries gives
you a seek offset that you _can_ directly seek to (as opposed to
seeking to the start of the block and then walking forward N
entries until you get an inter-block entry that is the same).
The ino64 branch only reserves space for d_off and does not use it in
any way. This is appropriate since actually using d_off is a major
feature addition.

Well, at some point ino64 will need to define a new getdirentries(2)
syscall and I believe this new syscall can have different/additional
arguments.
yes, posix only specifies 2 mandatory fields (d_ino and d_name) and
everything else is implementation dependent.
I'd suggest that the new gtedirentries(2) syscall should return a
flag to indicate that the underlying file system is filling in d_off.
Then the libc functions can use d_off if it it available.
(They will still need to "work" at least as well as they do now if
  the file system doesn't support d_off. The old getdirentries(2) syscall
  will be returning the old/current "struct dirent" which doesn't have
  the field anyhow.)

Another bit of fun is that the argument for seekdir()/telldir() is a
long and ends up 32bits for some arches. d_off is 64bits, since that
is what some file systems require.
what does linux use?
------
In glibc up to version 2.1.1, the return type of telldir() was off_t.
       POSIX.1-2001 specifies long, and this is the type used since glibc
       2.1.2.

also from the linux man page: this is interesting..

--------
       In early filesystems, the value returned by telldir() was a simple
file offset within a directory. Modern filesystems use tree or hash
       structures, rather than flat tables, to represent directories.  On
       such filesystems, the value returned by telldir() (and used
       internally by readdir(3)) is a "cookie" that is used by the
implementation to derive a position within a directory. Application
       programs should treat this strictly as an opaque value, making no
       assumptions about its contents.
------
but glibc uses the contents in a nonopaque (and possibly wrong) way itself in seekdir. .
(not following their own advice.)


Maybe the library code can only use d_off if it is a 64bit arch and
the file system is filling it in. (Or maybe the library can keep track
of 32<->64bit mappings for the offsets. I haven't looked at the libc
functions for a while, so I can't remember what they keep track of.)

one supposes a 32 bit system would not have such large file systems on it..
(maybe?)

rick

A proper d_off would still be useful even if UFS's readdir keeps
masking
off the offset so a directory read always starts at the beginning of
a
512-byte directory block, since this allows more distinct offset
values
than safely using getdirentries()'s *basep. With d_off, one outer
loop
must read at least one directory block to avoid spinning
indefinitely,
while using getdirentries()'s *basep requires reading the whole
getdirentries() buffer.

Some Linux filesystems go further and provide a unique d_off for each
entry.

Another idea would be to store the last d_ino instead of dd_loc into
the
struct ddloc. On seekdir(), this would seek to loc_seek as before and
skip entries until that d_ino is found, or to the start of the buffer
if
not found (and possibly return some entries again that should not be
returned, but Samba copes with that).

--
Jilles Tjoelker
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to
"[email protected]"



_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[email protected]"

Reply via email to