On Tue, Feb 06, 2018 at 02:24:25PM +0100, Ævar Arnfjörð Bjarmason wrote:

>  3) Such hooks slow down pushes, especially on big repos, you can
>     optimize things a bit (e.g. only look in the same directories), but
>     pathologically you end up needing to compare the cross-product of
>     changed files v.s. all existing files for each changed file.

I think you could just complain about any tree that contains entries
that have duplicate entries after normalization. I.e.:

  git rev-list --objects $new --not $old |
  awk '{print $1}' |
  git cat-file --batch-check='%(objecttype) %(objectname)' |
  awk '/^tree/ {print $2}'|
  while read tree; do
        dups=$(git ls-tree $tree | cut -f 2- | tr A-Z a-z | sort | uniq -d)
        test -z "$dups" || echo "$tree has duplicates: $dups"
  done

That gives reasonable algorithmic complexity, but of course the shell
implementation is horrific. One could imagine that this could be
implemented as part of fsck_tree(), though, which is already reading
through all the entries (unfortunately it requires auxiliary storage
linear with the size of a given tree object, but that's not too bad).

But it would probably need:

  1. To be enabled as an optional fsck warning, possibly even defaulting
     to "ignore".

  2. That "tr" could be any arbitrary transformation. Case-folding is
     the obvious one, but in theory you could match the normalization
     behavior of certain popular filesystems.

I'm not entirely convinced it's worth all of this effort, but I think it
would be _possible_ at least.

-Peff

Reply via email to