Re: [PATCH v5 3/6] dir_iterator: iterate over dir after its contents

Michael Haggerty Thu, 30 Mar 2017 04:03:58 -0700

On 03/30/2017 05:32 AM, Daniel Ferreira wrote:
> Create an option for the dir_iterator API to iterate over subdirectories
> only after having iterated through their contents. This feature was
> predicted, although not implemented by 0fe5043 ("dir_iterator: new API
> for iterating over a directory tree", 2016-06-18).
> 
> Add the "flags" parameter to dir_iterator_create, allowing for the
> aforementioned "depth-first" iteration mode to be enabled. Currently,
> the only acceptable flag is DIR_ITERATOR_DEPTH_FIRST.


The flag name has been changed.

> This is useful for recursively removing a directory and calling rmdir()
> on a directory only after all of its contents have been wiped.
> 
> Amend a call to dir_iterator_begin() to pass the flags parameter
> introduced.
> 
> Signed-off-by: Daniel Ferreira <bnm...@gmail.com>
> ---
>  dir-iterator.c       | 53 
> ++++++++++++++++++++++++++++++++++++++++++++++++----
>  dir-iterator.h       | 17 ++++++++++++-----
>  refs/files-backend.c |  2 +-
>  3 files changed, 62 insertions(+), 10 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index 3ac984b..05d53d2 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -47,6 +47,9 @@ struct dir_iterator_int {
>        * that will be included in this iteration.
>        */
>       struct dir_iterator_level *levels;
> +
> +     /* Holds the flags passed to dir_iterator_begin(). */
> +     unsigned flags;
>  };
>  
>  static inline void push_dir_level(struct dir_iterator_int *iter, struct 
> dir_iterator_level *level)
> @@ -113,12 +116,14 @@ int dir_iterator_advance(struct dir_iterator 
> *dir_iterator)
>                                       iter->base.path.buf, strerror(errno));
>                               /* Popping the level is handled below */
>                       }
> -             } else if (S_ISDIR(iter->base.st.st_mode)) {
> +             } else if (S_ISDIR(iter->base.st.st_mode) &&
> +                     !(iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL)) {
>                       if (level->dir_state == DIR_STATE_ITER) {
>                               /*
>                                * The directory was just iterated
>                                * over; now prepare to iterate into
> -                              * it.
> +                              * it (unless an option is set for us
> +                              * to do otherwise).
>                                */
>                               push_dir_level(iter, level);
>                               continue;
> @@ -152,7 +157,7 @@ int dir_iterator_advance(struct dir_iterator 
> *dir_iterator)
>                       de = readdir(level->dir);
>  
>                       if (!de) {
> -                             /* This level is exhausted; pop up a level. */
> +                             /* This level is exhausted  */
>                               if (errno) {
>                                       warning("error reading directory %s: 
> %s",
>                                               iter->base.path.buf, 
> strerror(errno));
> @@ -160,6 +165,32 @@ int dir_iterator_advance(struct dir_iterator 
> *dir_iterator)
>                                       warning("error closing directory %s: 
> %s",
>                                               iter->base.path.buf, 
> strerror(errno));
>  
> +                             if (iter->flags & 
> DIR_ITERATOR_POST_ORDER_TRAVERSAL) {
> +                                     /* If we are handling dirpaths after 
> their contents,

The comment opening and closing delimiters should be on their own lines.

> +                                      * we have to iterate over the 
> directory now that we'll
> +                                      * have finished iterating into it. */
> +                                     level->dir = NULL;

`level->dir = NULL` can be done before the `if` statement rather than
both here and after the `if` statement.

> +
> +                                     if (pop_dir_level(iter) == 0)
> +                                             return 
> dir_iterator_abort(dir_iterator);
> +
> +                                     level = &iter->levels[iter->levels_nr - 
> 1];
> +                                     /* Since we are iterating through the 
> dirpath

Comment opening should be on its own line.

> +                                      * after we have gone through it, we 
> still need
> +                                      * to get rid of the trailing slash we 
> appended.
> +                                      *
> +                                      * This may generate issues if we ever 
> want to
> +                                      * iterate through the root directory 
> AND have
> +                                      * post-order traversal enabled.
> +                                      */
> +                                     strbuf_strip_suffix(&iter->base.path, 
> "/");

Hmmm, I was wrong again. Remember that this code path is stripping the
"/" from the directory that has just been iterated over, and has already
been closed and popped off the stack. But if we are iterating over "/",
then when it is popped off the stack, `pop_dir_level(iter)` returns
zero, and we never get to this line. So I think you can remove this warning.

> +
> +                                     if (set_iterator_data(iter, level))
> +                                             continue;
> +
> +                                     return ITER_OK;
> +                             }
> +
>                               level->dir = NULL;
>                               if (pop_dir_level(iter) == 0)
>                                       return dir_iterator_abort(dir_iterator);
> @@ -174,6 +205,18 @@ int dir_iterator_advance(struct dir_iterator 
> *dir_iterator)
>                       if (set_iterator_data(iter, level))
>                               continue;
>  
> +                     /*
> +                      * If we want to iterate dirs after files, we shall
> +                      * begin looking into them *before* we return the dir
> +                      * itself.
> +                      */
> +                     if (S_ISDIR(iter->base.st.st_mode) &&
> +                             (iter->flags & 
> DIR_ITERATOR_POST_ORDER_TRAVERSAL)) {
> +                             push_dir_level(iter, level);
> +
> +                             break;
> +                     }
> +
>                       return ITER_OK;
>               }
>       }
> @@ -200,7 +243,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
>       return ITER_DONE;
>  }
>  
> -struct dir_iterator *dir_iterator_begin(const char *path)
> +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
>  {
>       struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
>       struct dir_iterator *dir_iterator = &iter->base;
> @@ -208,6 +251,8 @@ struct dir_iterator *dir_iterator_begin(const char *path)
>       if (!path || !*path)
>               die("BUG: empty path passed to dir_iterator_begin()");
>  
> +     iter->flags = flags;
> +
>       strbuf_init(&iter->base.path, PATH_MAX);
>       strbuf_addstr(&iter->base.path, path);
>  
> diff --git a/dir-iterator.h b/dir-iterator.h
> index 27739e6..649ccf6 100644
> --- a/dir-iterator.h
> +++ b/dir-iterator.h
> @@ -11,8 +11,7 @@
>   * Every time dir_iterator_advance() is called, update the members of
>   * the dir_iterator structure to reflect the next path in the
>   * iteration. The order that paths are iterated over within a
> - * directory is undefined, but directory paths are always iterated
> - * over before the subdirectory contents.
> + * directory is undefined.
>   *
>   * A typical iteration looks like this:
>   *
> @@ -38,6 +37,13 @@
>   * dir_iterator_advance() again.
>   */
>  
> +/* Possible flags for dir_iterator_begin().

Comment opening should be on its own line.

> [...]

So, using your handy-dandy `test-dir-iterator` program, I did a bunch of
manual testing.

It's a little bit unfortunate (though certainly not a showstopper) that
`lstat()` is called twice on directories in `POST_ORDER` mode.

More significant is that pre-order traversal includes unreadable
directory entries (e.g., those whose permissions don't allow reading),
whereas post-order traversal omits them. The reason is that when
`opendir()` fails, the `if (!level->dir)` code block pops it off the
stack before the `if (iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL)`
block gets a chance to return it to the caller.

This is not great. It seems like callers would want to know about
unreadable directories within the tree.

I haven't checked whether that problem can be fixed up easily in the
current code. But I'm getting convinced that the code (i.e., even my
code, before your changes) is more convoluted than it needs to be, and
that makes it too hard to work with. Quite likely you have the same
feeling by now... :-/

Imagine something like this:

Change the `dir_state` enumeration to the following:

* DIR_STATE_PUSHED -- the top-of-stack directory has just been pushed
  onto the stack but not iterated over or opened.
* DIR_STATE_PRE_ITERATED -- the top-of-stack directory has just been
  returned to the caller (pre-order), but not opened.
* DIR_STATE_ITERATING -- the top-of-stack directory has been opened and
  we are currently iterating over its entries.
* DIR_STATE_ITERATED -- the top-of-stack directory's entries have all
  been iterated over, and the directory has been closed.
* DIR_STATE_POST_ITERATED -- the top-of-stack directory has been
  returned to the caller (post-order) and is ready to be popped.

Then the function could have a single loop with a more conventional
state machine structure *doing exactly one thing per iteration*,
something like:

while (1) {
        if (level->dir_state == DIR_STATE_PUSHED) {
                if (iter->flags & DIR_ITERATOR_DIRS_BEFORE) {
                        ...set iterator data...
                        level->dir_state = DIR_STATE_PRE_ITERATED;
                        return ITER_OK;
                } else {
                        level->dir_state = DIR_STATE_PRE_ITERATED;
                }
        } else if (level->dir_state == DIR_STATE_PRE_ITERATED) {
                if (iter->flags & DIR_ITERATOR_RECURSE) {
                        ...open directory...
                        level->dir_state = DIR_STATE_ITERATING;
                } else {
                        level->dir_state = DIR_STATE_ITERATED;
                }
        } else if (level->dir_state == DIR_STATE_PRE_ITERATING) {
                if (readdir(...)) {
                        if (S_ISDIR) {
                                ...push dir...
                                level->dir_state = DIR_STATE_PUSHED;
                        } else {
                                ...set iterator data...
                                return ITER_OK;
                        }
                } else {
                        level->dir_state = DIR_STATE_ITERATED;
                }
        } else if (level->dir_state == DIR_STATE_ITERATED) {
                if (iter->flags & DIR_ITERATOR_DIRS_AFTER) {
                        ...set iterator data...
                        level->dir_state = DIR_STATE_POST_ITERATED;
                        return ITER_OK;
                } else {
                        level->dir_state = DIR_STATE_POST_ITERATED;
                }
        } else if (level->dir_state == DIR_STATE_POST_ITERATED) {
                ...pop dir...
                if (...done...) {
                        break;
                }
        }
}

It's a biggish rewrite (and I'm not implying that you have to do it),
but I think the results would be more readable than the current code.

Michael

Re: [PATCH v5 3/6] dir_iterator: iterate over dir after its contents

Reply via email to