On Sun, May 04, 2014 at 08:13:15AM +0200, Torsten Bögershausen wrote:
> > 1. Tell everyone that NFD in the git repo is wrong, and
> > they should make a new commit to normalize all their
> > in-repo files to be precomposed.
> > This is probably not the right thing to do, because it
> > still doesn't fix checkouts of old history. And it
> > spreads the problem to people on byte-preserving
> > filesystems (like ext4), because now they have to start
> > precomposing their filenames as they are adde to git.
> I'm not sure if I follow. People running ext4 (or Linux in general,
> or Windows, or Unix) do not suffer from file system
> "feature" of Mac OS, which accepts precomposed/decomposed Unicode
> but returns decompomsed.
What I mean by "spreads the problem" is that git on Linux does not need
to care about utf8 at all. It treats filenames as a byte sequence. But
if we were to start enforcing "filenames should be precomposed utf8",
then people adding files on Linux would want to enforce that, too.
People on Linux could ignore the issue as they do now, but they would
then create problems for OS X users if they add decomposed filenames.
IOW, if the OS X code assumes "all repo filenames are precomposed", then
other systems become a possible vector for violating that assumption.
> > 3. Convert index filenames to their precomposed form when
> > we read the index from disk. This would be efficient,
> > but we would have to be careful not to write the
> > precomposed forms back out to disk.
> How could we be careful?
> Mac OS writes always decomposed Unicode to disk.
> (And all other OS tend to use precomposed forms, mainly because the "keyboard
> driver" generates it.)
Sorry, I should have been more clear here. I meant "do not write index
entries using the precomposed forms out to the on-disk index". Because
that would mean that git silently converts your filenames, and it would
look like you have changes to commit whenever you read in a tree with a
Looking over the patch you sent earlier, I suspect that is part of its
problem (it stores the converted name in the index entry's name field).
> This is my understanding:
> Some possible fixes are:
> 1. Accept that NFD in a Git repo which is shared between Mac OS
> and Linux or Windows is problematic.
> Whenever core.precomposeunicode = true, do the following:
> Let Git under Mac OS change all file names in the index
> into the precomposed form when a new commit is done.
> This is probably not a wrong thing to do.
> When the index file is read into memory, precompose the file names and
> them with the precomposed form coming from precompose_utf8_readdir().
> This avoids decomposed file names to be reported as untracked by "git
This is the case I was specifically thinking of above (and I think what
your patch is doing).
> 2. Do all index filename comparisons under Mac OS X using a UTF-8 aware
> comparison function regardless if core.precomposeunicode is set.
> This would probably have bad performance, and somewhat
> defeats the point of converting the filenames at the
> readdir level in the first place.
Right, I'm concerned about performance here, but I wonder if we can
reuse the name-hash solutions from ignorecase.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html