On Sun, Apr 27, 2025 at 09:43:43PM -0400, Kent Overstreet wrote: > On Sun, Apr 27, 2025 at 06:30:59PM -0700, Eric Biggers wrote: > > On Sun, Apr 27, 2025 at 08:55:30PM -0400, Kent Overstreet wrote: > > > The thing is, that's exactly what we're doing. ext4 and bcachefs both > > > refer to a specific revision of the folding rules: for ext4 it's > > > specified in the superblock, for bcachefs it's hardcoded for the moment. > > > > > > I don't think this is the ideal approach, though. > > > > > > That means the folding rules are "whatever you got when you mkfs'd". > > > Think about what that means if you've got a fleet of machines, of > > > different ages, but all updated in sync: that's a really annoying way > > > for gremlins of the "why does this machine act differently" variety to > > > creep in. > > > > > > What I'd prefer is for the unicode folding rules to be transparently and > > > automatically updated when the kernel is updated, so that behaviour > > > stays in sync. That would behave more the way users would expect. > > > > > > But I only gave this real thought just over the past few days, and doing > > > this safely and correctly would require some fairly significant changes > > > to the way casefolding works. > > > > > > We'd have to ensure that lookups via the case sensitive name always > > > works, even if the casefolding table the dirent was created with give > > > different results that the currently active casefolding table. > > > > > > That would require storing two different "dirents" for each real dirent, > > > one normalized and one un-normalized, because we'd have to do an > > > un-normalized lookup if the normalized lookup fails (and vice versa). > > > Which should be completely fine from a performance POV, assuming we have > > > working negative dentries. > > > > > > But, if the unicode folding rules are stable enough (and one would hope > > > they are), hopefully all this is a non-issue. > > > > > > I'd have to gather more input from users of casefolding on other > > > filesystems before saying what our long term plans (if any) will be. > > > > Wouldn't lookups via the case-sensitive name keep working even if the > > case-insensitivity rules change? It's lookups via a case-insensitive name > > that > > could start producing different results. Applications can depend on > > case-insensitive lookups being done in a certain way, so changing the > > case-insensitivity rules can be risky. > > No, because right now on a case-insensitive filesystem we _only_ do the > lookup with the normalized name.
Well, changing the case-insensitivity rules on an existing filesystem breaks the directory indexing, so when the filesystem does an indexed lookup in a directory it might no longer look in the right place. But if the dentry were to be examined regardless, it would still match. (Again, assuming that the lookup uses a name that is case-sensitively the same as the name the file was created with. If it's not case-sensitively the same, that's another story.) ext4 and f2fs recently added a fallback to a linear search for dentries in "casefolded" directories, which handle this by no longer relying solely on the directory indexing. See commits 9e28059d56649 and 91b587ba79e1b. - Eric