On Fri, 25 Apr 2025 at 12:40, Matthew Wilcox <wi...@infradead.org> wrote: > > I think this is something that NTFS actually got right. Each filesystem > carries with it a 128KiB table that maps each codepoint to its > case-insensitive equivalent.
I agree that that is indeed a technically correct way to deal with case sensitivity at least from a filesystem standpoint. It does have some usability issues - exactly because of that "fixed at filesystem creation time" - but since in *practice* nobody actually cares about the odd cases, that isn't really much of a real issue. And the fixed translation table means that it at least gets versioning right, and you hopefully filled the table up sanely and don't end up with the crazy cases (ie the nonprinting characters etc) so hopefully it contains only the completely unambiguous stuff. That said, I really suspect that in practice, folding even just the 7-bit ASCII subset would have been ok and would have obviated even that table. And I say that as somebody who grew up in an environment that used a bigger character set than that. Of course, the NTFS stuff came about because FAT had code pages for just the 8-bit cases - and people used them, and that then caused odd issues when moving data around. Again - 8-bit tables were entirely sufficient in practice but actually caused more problems than not doing it at all would have. And then people go "we switched to 16-bit wide characters, so we need to expand on the code table too". Which is obviously exactly how you end up with that 128kB table. But you have to ask yourself: do you think that the people who made the incredibly bad choice to use a fixed 16-bit wide character set - which caused literally decades of trouble in Windows, and still shows up today - then made the perfect choice when dealing with case folding? Yeah, no. Still, I very much agree it was a better choice than "let's call random unicode routines we don't really appreciate the complexity of". Linus