Hello, I'm one of the maintainers of the Ceph file system (CephFS). I recently introduced case-insensitive directory trees into CephFS to support performance improvements for Samba. Perhaps my experience would add to this timely discussion.
[ A brief context/background on that effort for those interested: CephFS has an inherited (at mkdir) metadata on directories which adds "normalization" (string: unicode normalization type) or "casesensitive" (bool) which permanently affects how dentries are looked up. The directory must be empty and free of snapshots to change this metadata ("charmap"). Clients with support for this feature will perform a mapping [1] from the application (userspace libcephfs or ceph-fuse) path namespace to the MDS (metadata server) path namespace during a path walk or lookup. For affected directories, the MDS namespace is the normalized and possibly case-folded name for each dentry. The MDS also stores an uninterpreted "alternate_name" with each dentry that is the original name from the application namespace used to create the dentry. The alternate_name is **only** visible via file system operations that expose dentry names to the application, readdir/getcwd [2]. Benefits I identified for this approach were: - The core file system (mainly: MDS) paths were virtually unchanged. The MDS does not care about case sensitivity or normalization. It uses what the client gave it for the dentry name. - It's the client's job to perform any namespace mapping and store metadata in alternate_name to reverse the mapping (i.e. get the original dentry name used to create the dentry). - The Client's cache uses the same file system namespace as the MDS: dentry names are normalized/case-folded. The transformation is only applied during path walk / lookup with user-supplied paths/names. We use libicu (indirectly through boost) for actually doing the normalization / casefolding. It's simple [3]. It works. Despite your (Linus) objections to this in prior postings, I do not see this as problematic. There are backwards compatibility guarantees in the standard [4]. Does that mean mistakes can't happen? No. Certainly there could be a backwards-compatibility breakage where we have two physical dentries with names that should be equivalent: one dentry shadows the other for some clients with upgraded Unicode tables. Even so, I do not see this as significant barrier to adopting the Unicode routines. In my opinion, the real danger is a file system person foolishly thinking they know best by rolling their own mapping table and discovering why that's a terrible idea. I think this thread illustrates that in several places. ] On Fri, Apr 25, 2025 at 11:41 PM Linus Torvalds <torva...@linux-foundation.org> wrote: > > On Fri, 25 Apr 2025 at 20:09, Kent Overstreet <kent.overstr...@linux.dev> > wrote: > > > > The subject is CI lookups, and I'll eat my shoe if you wrote that. > > Start chomping. That nasty code with d_compare and d_hash goes way back. > > From a quick look, it's from '97, and got merged in in 2.1.50. It was > added (obviously) for FAT. Back then, that was the only case that > wanted it. > > I don't have any archives from that time, and I'm sure others were > involved, but that whole init_name_hash / partial_name_hash / > end_name_hash pattern in 2.1.50 looks like code I remember. So I was > at least part of it. > > The design, if you haven't figured it out yet, is that filesystems > that have case-independent name comparisons can do their own hash > functions and their own name comparison functions, exactly so that one > dentry can match multiple different strings (and different strings can > hash to the same bucket). > > If you get dentry aliases, you may be doing something wrong. I would not consider myself a kernel developer but I assume this terminology (dentry aliases) refers to multiple dentries in the dcache referring to the same physical dentry on the backing file system? If so, I can't convince myself that's a real problem. Wouldn't this be beneficial because each application/process may utilize a different name for the backing file system dentry? This keeps the cache hot with relevant names without any need to do transformations on the dentry names. Happy to learn otherwise because I expected this situation to occur in practice with ceph-fuse. I just tested and the dcache entries (/proc/sys/fs/dentry-state) increases as expected when performing case permutations on a case-insensitive file name. I didn't observe any cache inconsistencies when editing/removing these dentries. The danger perhaps is cache pollution and some kind of DoS? That should be a solvable problem but perhaps I misunderstand some complexity. > Also, originally this was all in the same core dcache lookup path. So > the whole "we have to check if the filesystem has its own hash > function" ended up slowing down the normal case. It's obviously been > massively modified since 1997 ("No, really?"), and now the code is > very much set up so that the straight-line normal case is all the > non-CI cases, and then case idnependence ends up out-of-line with its > own dcache hash lookup loops so that it doesn't affect the normal good > case. It's seems to me this is a good argument for keeping case-sensitivity awareness out of the dcache. Let the fs do the namespace mapping and accept that you may have dentry aliases. FWIW, I also wish we didn't have to deal with case-sensitivity but we have users/protocols to support (as usual). [1] https://github.com/ceph/ceph/blob/ebb2f72bfc37577d5389809ba0c16fca032acd8a/src/client/Client.cc#L7732-L7735 [2] https://github.com/ceph/ceph/blob/ebb2f72bfc37577d5389809ba0c16fca032acd8a/src/client/Client.cc#L1353 [3] https://github.com/ceph/ceph/blob/ebb2f72bfc37577d5389809ba0c16fca032acd8a/src/client/Client.cc#L1299-L1340 [4] https://unicode.org/reports/tr15/#Versioning Kind regards, -- Patrick Donnelly