Re: case-insensitive grep with accented letters

Stuart Henderson Sat, 31 May 2025 03:46:01 -0700

On 2025-05-31, rsyk...@disroot.org <rsyk...@disroot.org> wrote:
> Dear list,
>
>
> I was surprised to learn that 'grep -i' does not
> really work for accented letters
>
> odin:~$ cat a
> křížala
> kŘíŽala
> odin:~$ grep -i ž a
> křížala
> odin:~$ grep -i Ž a
> kŘíŽala
>
> As I had LC_COLLATE="C", I tried also with this
> set to en_US.UTF-8, but to no avail.
>
> Does grep -i only work for ascii letters?


yes, that's expected.

OpenBSD base doesn't support LC_COLLATE.

$ man -k ANY=LC_COLLATE
locale(1) - character encoding and localization conventions
glob, globfree(3) - generate pathnames matching a pattern
setlocale(3) - select character encoding
strcoll, strcoll_l(3) - compare strings according to current collation
strxfrm, strxfrm_l(3) - transform a string under locale
wcscoll, wcscoll_l(3) - compare wide strings according to the current collation
wcsxfrm, wcsxfrm_l(3) - transform a wide string under locale
$ man locale
LOCALE(1)                 General Commands Manual                 LOCALE(1)

NAME
     locale – character encoding and localization conventions

SYNOPSIS
     locale [-a | -m | charmap]
[...]

     A locale is a set of environment variables telling programs which
     character encoding, language and cultural conventions the user
     prefers.  Programs in the OpenBSD base system ignore the locale except
     for the character encoding, and it is not recommended to use any of
     these variables except that the following non-default setting is
     supported as an option:

           export LC_CTYPE=en_US.UTF-8

     Programs installed from packages(7) may or may not change behavior
     according to the locale.  Many programs use the X/Open System
     Interfaces naming scheme for the contents of the variables listed
     below, which is language[_TERRITORY][.encoding][@modifier]
[...]

> Is there a general way to achive 'true' case
> insensitive match (other than list all possibly
> present accented letters in both forms, i.e.,
> as [žŽ] in my case?

ggrep does in this instance, but I don't know how reliable that is.


-- 
Please keep replies on the mailing list.

Re: case-insensitive grep with accented letters

Reply via email to