On Tue, Apr 29, 2014 at 10:12:52AM -0700, Junio C Hamano wrote:
> Jeff King <p...@peff.net> writes:
> > This patch just adds a test to demonstrate the breakage.
> > Some possible fixes are:
> > 1. Tell everyone that NFD in the git repo is wrong, and
> > they should make a new commit to normalize all their
> > in-repo files to be precomposed.
> > This is probably not the right thing to do, because it
> > still doesn't fix checkouts of old history. And it
> > spreads the problem to people on byte-preserving
> > filesystems (like ext4), because now they have to start
> > precomposing their filenames as they are adde to git.
> Hmm, have we taught the "compare precomposed" for codepaths that
> compare two trees and a tree and the index, too? Otherwise, we
> would have the same issue with commits in the old history.
Ugh, yeah, I didn't think about that codepath. I think we would not want
to precompose in that case. IOW, git works byte-wise internally, but it
is only at the filesystem layer that we do such munging. The index
straddles the line between the filesystem and git's internal
I think my "keep the normalized names alongside index entries" approach
might still work there. But it means that we compare against the "real"
byte-wise names on the tree side, and against the normalized names on
the path side. But that means having two comparison/lookup functions for
the index, and always using the right one. And algorithms that rely on
traversing two sorted lists cannot work in both directions.
> Do we have a similar issue for older commit in a history under
> "ignore-case" as well?
I don't think so, because we handle ignorecase completely differently.
There we use the name-hash with a case-insensitive hash and a
case-insensitive comparison function. And we use strcasecmp liberally
throughout the code.
I don't think we have a "str_utf8_cmp" that ignores normalizations (or
maybe strcoll will do this?). But in theory we could use it everywhere
we use strcasecmp for ignore_case. And then we would not need to have
our readdir wrapper, maybe? I admit I haven't thought that much about
_either_ approach. But aside from some bugs in the hash system, I do not
recall seeing any design problems in the ignorecase code.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html