Re: [GIT PULL] bcachefs fixes for 6.15-rc4

Patrick Donnelly Tue, 29 Apr 2025 08:37:13 -0700

Hello,

I'm one of the maintainers of the Ceph file system (CephFS). I
recently introduced case-insensitive directory trees into CephFS to
support performance improvements for Samba. Perhaps my experience
would add to this timely discussion.

[

A brief context/background on that effort for those interested:

CephFS has an inherited (at mkdir) metadata on directories which adds
"normalization" (string: unicode normalization type) or
"casesensitive" (bool) which permanently affects how dentries are
looked up. The directory must be empty and free of snapshots to change
this metadata ("charmap").

Clients with support for this feature will perform a mapping [1] from
the application (userspace libcephfs or ceph-fuse) path namespace to
the MDS (metadata server) path namespace during a path walk or lookup.
For affected directories, the MDS namespace is the normalized and
possibly case-folded name for each dentry. The MDS also stores an
uninterpreted "alternate_name" with each dentry that is the original
name from the application namespace used to create the dentry. The
alternate_name is **only** visible via file system operations that
expose dentry names to the application, readdir/getcwd [2].

Benefits I identified for this approach were:

- The core file system (mainly: MDS) paths were virtually unchanged.
The MDS does not care about case sensitivity or normalization. It uses
what the client gave it for the dentry name.
- It's the client's job to perform any namespace mapping and store
metadata in alternate_name to reverse the mapping (i.e. get the
original dentry name used to create the dentry).
- The Client's cache uses the same file system namespace as the MDS:
dentry names are normalized/case-folded. The transformation is only
applied during path walk / lookup with user-supplied paths/names.

We use libicu (indirectly through boost) for actually doing the
normalization / casefolding. It's simple [3]. It works. Despite your
(Linus) objections to this in prior postings, I do not see this as
problematic. There are backwards compatibility guarantees in the
standard [4].  Does that mean mistakes can't happen? No. Certainly
there could be a backwards-compatibility breakage where we have two
physical dentries with names that should be equivalent: one dentry
shadows the other for some clients with upgraded Unicode tables. Even
so, I do not see this as significant barrier to adopting the Unicode
routines. In my opinion, the real danger is a file system person
foolishly thinking they know best by rolling their own mapping table
and discovering why that's a terrible idea. I think this thread
illustrates that in several places.

]

On Fri, Apr 25, 2025 at 11:41 PM Linus Torvalds
<[email protected]> wrote:
>
> On Fri, 25 Apr 2025 at 20:09, Kent Overstreet <[email protected]> 
> wrote:
> >
> > The subject is CI lookups, and I'll eat my shoe if you wrote that.
>
> Start chomping. That nasty code with d_compare and d_hash goes way back.
>
> From a quick look, it's from '97, and got merged in in 2.1.50. It was
> added (obviously) for FAT. Back then, that was the only case that
> wanted it.
>
> I don't have any archives from that time, and I'm sure others were
> involved, but that whole init_name_hash / partial_name_hash /
> end_name_hash pattern in 2.1.50 looks like code I remember. So I was
> at least part of it.
>
> The design, if you haven't figured it out yet, is that filesystems
> that have case-independent name comparisons can do their own hash
> functions and their own name comparison functions, exactly so that one
> dentry can match multiple different strings (and different strings can
> hash to the same bucket).
>
> If you get dentry aliases, you may be doing something wrong.

I would not consider myself a kernel developer but I assume this
terminology (dentry aliases) refers to multiple dentries in the dcache
referring to the same physical dentry on the backing file system?

If so, I can't convince myself that's a real problem. Wouldn't this be
beneficial because each application/process may utilize a different
name for the backing file system dentry? This keeps the cache hot with
relevant names without any need to do transformations on the dentry
names. Happy to learn otherwise because I expected this situation to
occur in practice with ceph-fuse. I just tested and the dcache entries
(/proc/sys/fs/dentry-state) increases as expected when performing case
permutations on a case-insensitive file name. I didn't observe any
cache inconsistencies when editing/removing these dentries. The danger
perhaps is cache pollution and some kind of DoS? That should be a
solvable problem but perhaps I misunderstand some complexity.

> Also, originally this was all in the same core dcache lookup path. So
> the whole "we have to check if the filesystem has its own hash
> function" ended up slowing down the normal case. It's obviously been
> massively modified since 1997 ("No, really?"), and now the code is
> very much set up so that the straight-line normal case is all the
> non-CI cases, and then case idnependence ends up out-of-line with its
> own dcache hash lookup loops so that it doesn't affect the normal good
> case.

It's seems to me this is a good argument for keeping case-sensitivity
awareness out of the dcache. Let the fs do the namespace mapping and
accept that you may have dentry aliases.

FWIW, I also wish we didn't have to deal with case-sensitivity but we
have users/protocols to support (as usual).

[1] 
https://github.com/ceph/ceph/blob/ebb2f72bfc37577d5389809ba0c16fca032acd8a/src/client/Client.cc#L7732-L7735
[2] 
https://github.com/ceph/ceph/blob/ebb2f72bfc37577d5389809ba0c16fca032acd8a/src/client/Client.cc#L1353
[3] 
https://github.com/ceph/ceph/blob/ebb2f72bfc37577d5389809ba0c16fca032acd8a/src/client/Client.cc#L1299-L1340
[4] https://unicode.org/reports/tr15/#Versioning

Kind regards,

--
Patrick Donnelly

Re: [GIT PULL] bcachefs fixes for 6.15-rc4

Reply via email to