Ces VLC wrote: > On Sat, Jul 4, 2020 at 2:43 AM Jim DeLaHunt <list+macports-users at jdlh.com> > wrote: >> >> [...] I hope you see the distinctions I'm trying to explain. And, I hope > this helps you figure out a solution. Please let the list know what you > find out. >> > > Thanks a lot, Ryan and Jim, for your messages and for the great information > you provided. It's very complete, and, yes, Jim, what you described is the > cause of the problem: rsync just transmits file names as verbatim raw > sequences of bytes with no conversion at all. > > IMHO, the correct way of fixing this shouldn't be by manually converting > the encodings yourself with the '--iconv' flag, but actually with a flag > for performing the check after normalization, which AFAIK doesn't exist (it > wouldn't matter what normalization, just apply the same normalization to > all file names before comparing them, and then discard the normalization). > What I mean is, what's the purpose of rsync considering as different two > files whose name is identical when being displayed in a terminal? Two > identical text strings can be normalized in different ways (for example: > accents in separated codes, or in composed codes), but they are the same > text. So, if the text is the same, why consider them as different file > names?
Sounds like a perfectly valid feature request for the rsync project. > I don't understand why such '--normalize-before-compare' flag doesn't exist > (I insist: no need to specify the normalization algorithm, just apply the > same algorithm to all file names). It would fix all these problems in an > elegant and clean way, and, BTW, this would be the behaviour everybody > expects, if I'm not missing any point here. It probably just didn't come up before APFS became widespread on macOS. And still doesn't come up if all your filenames are ASCII. This behaviour has the slight disadvantage of being technically incorrect on normalization-sensitive filesystems. On your typical Linux system, it's entirely possible to have two filenames that differ only in normalization. And you know if it's possible, then someone somewhere has a workflow that depends on it. It might make sense to have normalize-before-compare turned on by default on Darwin, and off by default elsewhere, with a flag to enable or disable as needed. As you say, it could sometimes be preferable behaviour even on normalization-sensitive systems. - Josh
