On Monday, August 18th, 2025 at 09:27, Bernhard Voelker
<m...@bernhard-voelker.de> wrote:
> I've rolled the change into a proper Git commit and would push it like
> that in your name (unless you don't want that, or want another change).
> Good to go?
I haven't been able to identify another change yet! danny mcClanahan
<dmc2@amass.energy> is fine for attribution.
> > As a side note, I have been working on a library to perform directory
> > traversal and really wished
>
> > I'd looked at `find` more closely earlier. In particular the discussion of
> > symbolic link handling,
>
> > stat optimization with -noleaf, and the clever error behavior from
> > -ignore_readdir_race all help
>
> > to resolve some questions I'd been having difficulty with.
>
>
> Other implementations using the gnulib FTS module are e.g. in coreutils:
> du(1), rm(1), cp(1) and
> chmod(1). There might be interesting cases for you as well.
Thanks for identifying the FTS module! I've been learning about autotools
recently as I am working on a build tool but am less familiar with gnulib at
the moment and hadn't parsed everything used by findutils.
...[changing subject]...
I wasn't sure earlier, but it now seems appropriate to mention: one of the
problems I've been spending a truly inordinate amount of time on is
multithreaded directory traversal, particularly while incorporating ignore
patterns and deduplicating overlapping symlink targets ("deduplicating" is
difficult to even define consistently in the presence of path-based filtering).
On top of that, I also want to construct an in-memory VFS representation which
can be updated in response to e.g. inotify events. One use case for this would
be to update source code indices (I mention an example application at the end
of my emacsconf talk last year: https://emacsconf.org/2024/talks/regex/).
I suspect that ignore patterns and notify events are out of scope for findutils
(and are not terribly difficult to achieve with a small shell script invoking
find and other utilities). However, I'd be curious to know if multithreaded
directory traversal would ever be in scope for the find and/or updatedb
commands (I haven't looked at updatedb yet). I suspect it would induce a great
degree of complexity and nondeterminism to a tool like find which otherwise
upholds such precise guarantees upon its behavior.
Given all of that, is there a project that might be interested in such a
feature? Or is there a way to limit the scope of such a feature that would make
it useful for find itself?
Thanks again for your time, and please no need to reply extensively. I fixed a
deadlock and other issues in the rust "ignore" library used by ripgrep a while
ago, and I would like parallel traversal to be available to portable C
utilities as well.
Have a great day,
danny