https://bz.mercurial-scm.org/show_bug.cgi?id=6968

            Bug ID: 6968
           Summary: Unicode normalization in file names
           Product: Mercurial
           Version: 6.8.1
          Hardware: PC
                OS: NetBSD
            Status: UNCONFIRMED
          Severity: feature
          Priority: wish
         Component: Mercurial
          Assignee: bugzi...@mercurial-scm.org
          Reporter: mercurial-bugzi...@campbell.mumble.net
                CC: mercurial-de...@mercurial-scm.org
    Python Version: ---

I would like to make sure a repository is safe for use on multiple file systems
and multiple operating systems including ufs/ffs and similar Unix-oriented file
systems (bag of bytes), zfs with utf8only, Apple HFS+, Apple APFS, and others. 
I am willing to accept some constraints, set some configuration options, and
install hooks that enforce rules.

Here is how I think I would like it to work, but I haven't tested yet:

1. The repository stores only files with paths that are valid UTF-8 strings in
NFC, internally.
2. When hg operates on a file in the file system, it uses NFC paths.
3. When hg lists directories to discover new files, it normalizes them into NFC
(and rejects/ignores files whose names have invalid UTF-8).
4. Any tree cannot have two paths that are equivalent modulo normalization and
case.

I reviewed https://wiki.mercurial-scm.org/EncodingStrategy and I'm not sure it
addresses how to achieve this.  I believe the git option
core.precomposeUnicode=true will do (3):
https://git-scm.com/docs/git-config/2.47.1#Documentation/git-config.txt-coreprecomposeUnicode

Some constraints that may make this simpler than a grand unified theory of
pathname encoding questions:

- The makefile issue is not relevant at present -- non-ASCII file names won't
appear in makefiles.
- Users will use a central server to enforce rules on changesets when pushing.
- Users can be asked to use particular .hg/hgrc configuration and hooks (though
ideally it would be just a .hg/hgrc config line).
- I can rewrite the complete existing history for now (though that will change
when the flag day of conversion comes, so I want to make sure that I have
careful -- and thoroughly tested -- input validation to make sure it doesn't
become a problem in the future).

So: Are there any existing hg config options I can enable for this, or for a
similar goal that prior experience suggests is better?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@lists.mercurial-scm.org
https://lists.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to