On Thu, 24 Apr 2025 at 21:52, Kent Overstreet <kent.overstr...@linux.dev> wrote: > > And the attitude of "I hate this, so I'm going to partition this off as > much as I can and spend as little time as I can on this" has all made > this even worse - the dcache stuff is all half baked.
No. The dcache side is *correct*. The thing is, you absolutely cannot make the case-insensitive lookup be the fast case. So it's partitioned off not because people don't want to deal with it (which also admittedly _is_ true), but because partitioning off is a firewall against the code generation garbage case that simply *cannot* be done well and allows the proper cases to be properly optimized. Now, if filesystem people were to see the light, and have a proper and well-designed case insensitivity, that might change. But I've never seen even a *whiff* of that. I have only seen bad code that understands neither how UTF-8 works, nor how unicode works (or rather: how unicode does *not* work - code that uses the unicode comparison functions without a deeper understanding of what the implications are). Your comments blaming unicode is only another sign of that. Because no, the problem with bad case folding isn't in unicode. It's in filesystem people who didn't understand - and still don't, after decades - that you MUST NOT just blindly follow some external case folding table that you don't understand and that can change over time. The "change overr time" part is particularly vexing to me, because it breaks one of the fundamental rules that unicode was *supposed* to fix: no locale garbage. And the moment you think you need "unicode versioning", you have basically now created a locale with a different name, and you MISSED THE WHOLE %^$*ING POINT OF IT ALL. And yes, *those* problems come from people thinking it's "somebody else's problem that they solved for me" without actually understanding that no, that wasn't the case at all. Many of the unicode rules were about *glyphs*, and simply cannot be used for filesystems or equality comparisons. Which isn't to say that Unicode doesn't have problems, but the real problem is then using it without understanding the problems. So the real issue with unicode is that it's very complicated, and it tried to solve many different problems, and that then resulted in people not understanding that not all of it was appropriate for *their* use. Part of it is the "CS disease": thinking that a generic solution is always "better". Not so. Being overrly generic is often much much worse than having a targeted solution to a intentionally limited problem. "Everything Should Be Made as Simple as Possible, But Not Simpler". and involving unicode in case folding is antithetical to that fundamental concept. What I personally strongly feel should have been done is to just limit case folding knowingly to a very strict subset, and people should have said "we're being backwards compatible with FAT" or something like that. Instead of extending the problem space to the point where it becomes a huge problem, re-introduces "locales" in a different guise, and creates security issues because people don't understand just *how* big they made the problem space. Oh well. Rant over. Linus