On Mon, May 22, 2017 at 12:48 AM, Junio C Hamano <gits...@pobox.com> wrote:
> Samuel Lijin <sxli...@gmail.com> writes:
>
>> +     for (j = i = 0; i < dir.nr;) {
>> +             for (;
>> +                  j < dir.ignored_nr &&
>> +                    0 <= cmp_dir_entry(&dir.entries[i], &dir.ignored[j]);
>> +                  j++);
>> +
>> +             if ((j < dir.ignored_nr) &&
>> +                             check_dir_entry_contains(dir.entries[i], 
>> dir.ignored[j])) {
>> +                     /* skip any dir.entries which contains a dir.ignored */
>> +                     free(dir.entries[i]);
>> +                     dir.entries[i++] = NULL;
>> +             } else {
>> +                     /* prune the contents of a dir.entries which will be 
>> removed */
>> +                     struct dir_entry *ent = dir.entries[i++];
>> +                     for (;
>> +                          i < dir.nr &&
>> +                            check_dir_entry_contains(ent, dir.entries[i]);
>> +                          i++) {
>> +                             free(dir.entries[i]);
>> +                             dir.entries[i] = NULL;
>> +                     }
>> +             }
>> +     }
>
> The second loop in the else clause is a bit tricky, and the comment
> "which will be removed" is not all that helpful to explain why the
> loop is there.
>
> But I think the code is correct.  Here is how I understood it.
>
>     While looking at dir.entries[i], the code noticed that nothing
>     in that directory is ignored.  But entries in dir.entries[] that
>     come later may be contained in dir.entries[i] and we just want
>     to show the top-level untracked one (e.g. "a/" and "a/b/" were
>     in entries[], there is nothing in "a/", so naturally there is
>     nothing in "a/b/", but we do not want to bother showing
>     both---showing "a/" alone saying "the entire a/ is untracked" is
>     what we want).

Yep, that's the gist of it.

More specifically: the contents of untracked dirs have to be picked up
so that clean -d can distinguish between purely untracked dirs and
untracked dirs which also contain ignored files. In the case of a
purely untracked dir, clean -d can remove the dir itself, and should
just skip over all the dir's contents; in the case of a mixed
untracked dir, clean -d should not remove the dir itself, just the
untracked contents.

> We may want to have a test to ensure "a/b/" is indeed omitted in
> such a situation from the output, though.

Is there a reason to ensure specifically a/b/ is tested, or are the
current tests, which do cover the a/b (i.e. where a/b is a file, not
where a/b/ is a dir) case, insufficient for some reason?

> By the way, instead of putting NULL, it may be easier to follow if
> you used two pointers, src and dst, into dir.entries[], just like
> you did in your latest version of [PATCH 4/6].  That way, you do not
> have to change anything in the later loop that walks over elements
> in the dir.entries[] array.  It would also help the logic easier to
> follow if the above loop were its own helper function.

Agreed on the helper function. On the src-dst thing: I considered it,
but I figured another O(n) set of array moves was unnecessary. I guess
this is one of those cases where premature optimization doesn't make
sense?

> Putting them all together, here is what I came up with that can be
> squashed into your patch.  I am undecided myself if this is easier
> to follow than your version, but it seems to pass your test ;-)
>
> Thanks.
>
>  builtin/clean.c | 70 
> ++++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 42 insertions(+), 28 deletions(-)
>
> diff --git a/builtin/clean.c b/builtin/clean.c
> index dd3308a447..c8712e7ac8 100644
> --- a/builtin/clean.c
> +++ b/builtin/clean.c
> @@ -851,9 +851,49 @@ static void interactive_main_loop(void)
>         }
>  }
>
> +static void simplify_untracked(struct dir_struct *dir)
> +{
> +       int src, dst, ign;
> +
> +       for (src = dst = ign = 0; src < dir->nr; src++) {
> +               /*
> +                * Skip entries in ignored[] that cannot be inside
> +                * entries[src]
> +                */
> +               while (ign < dir->ignored_nr &&
> +                      0 <= cmp_dir_entry(&dir->entries[src], 
> &dir->ignored[ign]))
> +                       ign++;
> +
> +               if (dir->ignored_nr <= ign ||
> +                   !check_dir_entry_contains(dir->entries[src], 
> dir->ignored[ign])) {
> +                       /*
> +                        * entries[src] does not contain an ignored
> +                        * path -- we need to keep it.  But we do not
> +                        * want to show entries[] that are contained
> +                        * in entries[src].
> +                        */
> +                       struct dir_entry *ent = dir->entries[src++];
> +                       dir->entries[dst++] = ent;
> +                       while (src < dir->nr &&
> +                              check_dir_entry_contains(ent, 
> dir->entries[src])) {
> +                               free(dir->entries[src++]);
> +                       }
> +                       /* compensate for the outer loop's loop control */
> +                       src--;
> +               } else {
> +                       /*
> +                        * entries[src] contains an ignored path --
> +                        * drop it.
> +                        */
> +                       free(dir->entries[src]);
> +               }
> +       }
> +       dir->nr = dst;
> +}
> +
>  int cmd_clean(int argc, const char **argv, const char *prefix)
>  {
> -       int i, j, res;
> +       int i, res;
>         int dry_run = 0, remove_directories = 0, quiet = 0, ignored = 0;
>         int ignored_only = 0, config_set = 0, errors = 0, gone = 1;
>         int rm_flags = REMOVE_DIR_KEEP_NESTED_GIT;
> @@ -928,30 +968,7 @@ int cmd_clean(int argc, const char **argv, const char 
> *prefix)
>                        prefix, argv);
>
>         fill_directory(&dir, &pathspec);
> -
> -       for (j = i = 0; i < dir.nr;) {
> -               for (;
> -                    j < dir.ignored_nr &&
> -                      0 <= cmp_dir_entry(&dir.entries[i], &dir.ignored[j]);
> -                    j++);
> -
> -               if ((j < dir.ignored_nr) &&
> -                               check_dir_entry_contains(dir.entries[i], 
> dir.ignored[j])) {
> -                       /* skip any dir.entries which contains a dir.ignored 
> */
> -                       free(dir.entries[i]);
> -                       dir.entries[i++] = NULL;
> -               } else {
> -                       /* prune the contents of a dir.entries which will be 
> removed */
> -                       struct dir_entry *ent = dir.entries[i++];
> -                       for (;
> -                            i < dir.nr &&
> -                              check_dir_entry_contains(ent, dir.entries[i]);
> -                            i++) {
> -                               free(dir.entries[i]);
> -                               dir.entries[i] = NULL;
> -                       }
> -               }
> -       }
> +       simplify_untracked(&dir);
>
>         for (i = 0; i < dir.nr; i++) {
>                 struct dir_entry *ent = dir.entries[i];
> @@ -959,9 +976,6 @@ int cmd_clean(int argc, const char **argv, const char 
> *prefix)
>                 struct stat st;
>                 const char *rel;
>
> -               if (!ent)
> -                       continue;
> -
>                 if (!cache_name_is_other(ent->name, ent->len))
>                         continue;
>

Reply via email to