Paul Eggert <[EMAIL PROTECTED]> wrote: > Jim Meyering <[EMAIL PROTECTED]> writes: > >> When an NFS client sees a successful unlink, it is reasonable to >> expect a client-side rewinddir/readdir sequence *not* to produce >> the just-unlinked name. > > I agree, but won't this hurt rewinddir performance? After all, one of
Won't *what* hurt rewinddir performance? The test for whether to call rewinddir is only performed upon reaching end of directory (readdir returns NULL with errno == 0). The rewinddir call is then performed only when n_unlinked_since_opendir_or_last_rewind is at least as large as CONSECUTIVE_READDIR_UNLINK_THRESHOLD. The cost of calling rewinddir (and rereading any "." and ".." entries) once per directory-containing-CONSECUTIVE_READDIR_UNLINK_THRESHOLD-or-more entries seems small. I think I measured it back when I lowered its value to 10. Just in case, I've timed it again, using what should be a worst-case scenario: a hierarchy with branching factor of 11. This example has 11^5 leaf directories. No files. on a tmpfs file system. All names have /t/z/ prefix. Here are examples: /t/z/0/0/0/1/0 /t/z/0/0/0/0/10 /t/z/0/0/0/0/9 /t/z/0/0/0/0/8 /t/z/0/0/0/0/7 /t/z/0/0/0/0/6 /t/z/0/0/0/0/5 /t/z/0/0/0/0/4 /t/z/0/0/0/0/3 /t/z/0/0/0/0/2 /t/z/0/0/0/0/1 /t/z/0/0/0/0/0 Removing such a hierarchy on an AMD-64/3400+ and linux-2.6.18-3-amd64 with GNU rm (6.8+) averages around ~6.1 seconds on a mostly idle system. However, there is so much deviation (sometimes over 1s) in those timings that there is no hope of seeing any difference when NEED_REWIND is 0. Even ignoring the worst deviations, if there is a difference, it is immeasurable. Then I realized that the above, being a tree of all directories, wasn't a good test at all. The rewinddir code is not relevant, since the number of consecutive_readdirs_... never exceeds the threshold. So I redid it with the leaves as files. With that scenario, rm runs much faster so it can remove a 10^6-leaf tree in a reasonable amount of time, (once I made sure there was 1GB free on the target file system :-) so I adjusted CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 9, and reran: Here's stock rm with the above change: $ for i in $(seq 10); do z-mktree --root=/t/z --b=10 --d=6; /usr/bin/time /cu/src/rm -rf /t/z; done 1.11user 9.82system 0:11.08elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 1.00user 8.59system 0:09.63elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 0.99user 8.99system 0:11.96elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 1.15user 8.75system 0:11.48elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 1.28user 8.39system 0:09.68elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 1.00user 8.80system 0:09.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 1.05user 8.94system 0:10.15elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 1.14user 9.42system 0:10.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 1.14user 8.70system 0:09.89elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 0.88user 9.63system 0:12.05elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps Here's the same test, but with stock rm having NEED_REWIND == 0: i.e., no extra rewinddir or readdir calls: 0.93user 8.62system 0:09.85elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 0.92user 8.63system 0:09.66elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 1.08user 9.50system 0:11.35elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 1.08user 9.15system 0:10.68elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 0.99user 9.28system 0:10.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 1.14user 7.96system 0:09.53elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 1.02user 9.05system 0:10.10elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 1.04user 8.35system 0:09.40elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps 0.98user 8.40system 0:09.45elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 1.14user 8.76system 0:09.91elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+166minor)pagefaults 0swaps As before, there's so much noise (I don't know why), that it's hard to compare them, but here are the summaries, considering only elapsed time: With the extra rewinddir+two-readdir-calls: avg: 10.64 std. deviation: 0.94 Without them avg: 10.021 std. deviation: 0.61 With the difference at just 6%, and the same size as one standard deviation, and considering this is the absolute worst case, I'm not worried. > the goals of the existing Mac OS workaround was to not be much of a > performance hit elsewhere. Definitely. Are you concerned? >> I hope this sort of coherence (between >> an unlink syscall and a subsequent rewinddir/readdir) is guaranteed >> by a standard. > > "standard"? NFS? I'm afraid not. Nobody guarantees POSIX behavior > once NFS is involved, as far as I know. Right :) "in common practice" would be enough. > PS. Somewhat off the subject, anyone interested in NFS and guarantees > should know about Ed Nightingale's recent work in this area, e.g., his > "Rethink the Sync" paper. The underyling idea (speculative execution > in Linux) is a good one, and the performance results are impressive. > See <http://notrump.eecs.umich.edu/group/group.html>. It's > first-class stuff (not that it'll help you here....). Thanks for the pointer. _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
