On Sat, Dec 27, 2008 at 11:42 AM, Jim Meyering <[email protected]> wrote:
> If find were to apply this same technique, it would make the > -noleaf predicate a no-op. > > FYI, when I run the modified find on a reiserfs-backed 1.6M-file maildir > hierarchy, it takes only 80 seconds (2.6.26, athlon64 3400+, 2yr-old disk), > while using the latest unmodified find, at over 16 minutes, it takes 12 > times as long. > > I'll post again when I have an fts patch implementing > the approach outlined above. This performance improvement is impressive and welcome. I very much hope that we can end up with this improvement, or something like it, without introducing a bug. I'd like to point out two potential problems though: 1. Some versions of Linux have bugs in the implementation of various handlers in the /proc filesystem that lead st_nlinks to be misleading. I've reported a couple of these over the years. I have no mechanism for detecting such bugs in released kernel versions (not least since some parts of /proc only exist on some architectures or if some hardware is present or if some feature is available). A recent example of this is described at <http://lists.debian.org/debian-user/2008/08/msg00972.html>. I'm mostly happy to write these cases off as kernel bugs that should be fixed. However, if we end up using some kind of filesystem whitelist, Linux's proc filesystem should be excluded. 2. There exist other filesystems where (st_links - (0 or 2)) < (subdirectory count). The only example I can think of offhand is AFS. There is some incomplete information at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=143111 Thanks, James.
