Hello, Stuart Henderson wrote on Sat, May 31, 2025 at 10:45:17AM -0000: > On 2025-05-31, rsyk...@disroot.org <rsyk...@disroot.org> wrote:
>> I was surprised to learn that 'grep -i' does not >> really work for accented letters > OpenBSD base doesn't support LC_COLLATE. Everything that sthen@ said is correct. Let me add that supporting LC_COLLATE is not even a long-term goal. LC_COLLATE is among the most complicated aspects of locales. The collation order depends on the language, and for some languages, there is even more than one collation order that is commonly used. We certainly do not want to poison our libc with that amount of complexity. That said, implementing 'grep -i' for non-ASCII characters does not strictly require LC_COLLATE support (as opposed to, for example, sort(1) might). What *is* needed is working towlower(3) support in libc, which is controlled by LC_CTYPE, and which we do have (and it is reasonably up to date because our libc Unicode support follows Perl, currently at Unicode Version 15.0.0, released in September 2022). For example, towlower(U+017D) works for me and returns U+017E. Your desire requires wide-character support in both regexec(3) and grep(1) such that (1) U+017D can be recognized as a character rather than being treated as two bytes and (2) towlower can transform it to U+017E and (3) the result can then be compared to the command line argument in a wchar_t to wchar_t comparison. These are multiple tasks of significant difficulty and size. Maybe, as a partial solution, it would even be possible to improve *only* grep(1) while leaving the (even more scary) regexec(3) alone, i.e. have grep(1), when called with -i, convert both the command line arguments and every input line to lower case with towlower(3), then pass both to the narrow-character regexec(3), which should work for your use case. It would not work for other use cases though; for example, /./ still wouldn't match an accented character. Yours, Ingo