On Tue, Jun 23, 2020 at 02:30:51AM -0700, Michael Forney wrote:
> recurse() is peculiar function since it is used for two purposes:
> 
> 1. To bootstrap the recursion process.
> 2. To perform recursion when a directory is encountered.
> 
> In the first case, we stat() the directory, record it in the history
> list, then depending on the value of DIRFIRST, we call the function
> and then recurse or vice-versa. The function then calls recurse()
> again on the same directory, but since we have already added it to
> the history list, this call gets pruned.
> 
> In the second case, we stat() the directory a second time (we already
> did this when traversing the parent directory's entries), then call
> the function on each entry.
> 
> This approach leads to a few problems:
> 
> - We stat() each directory in the hierarchy twice, once at the
>   beginning of recurse(), and once for each contained directory.
> - We need a DIRFIRST flag to specify whether the function recurses
>   at the start or end, even though just running the function would
>   do things in the correct order.
> - History is only checked when we are about to recurse on a directory,
>   *after* we have already operated on it. So while we do prevent
>   infinite loops, we double-count the directory.

I don't feel qualified to criticise the overall design, but do we not still
need a way to specify whether traversal should be pre-order or post-order?
I figure this is what DIRFIRST was for right?

>
[snip]
> +
>  void
>  recurse(int dirfd, const char *name, void *data, struct recursor *r)
>  {
> -     struct dirent *d;
> -     struct history *new, *h;
> -     struct stat st, dst;
> -     DIR *dp;
> -     int flags = 0, fd;
> -     size_t pathlen = r->pathlen;
> -
> -     if (dirfd == AT_FDCWD)
> -             pathlen = estrlcpy(r->path, name, sizeof(r->path));

Now that we no longer do this, r->path is not being reset between
separate calls, so we have:

% mkdir D
% echo 'hello world' > f
% du D f
4       D
4       f
% ~/sbase/du D f
4       D
4       D

Applying

diff --git a/du.c b/du.c
index 6388159..e06234a 100644
--- a/du.c
+++ b/du.c
@@ -109,6 +109,7 @@ main(int argc, char *argv[])
                for (; *argv; argc--, argv++) {
                        n = 0;
                        recurse(AT_FDCWD, *argv, &n, &r);
+                       r.pathlen = 0;
                }
        }

fixes this, though I guess it's better to do the initialisation in recurse().

Thanks,
Richard

Reply via email to