On 2025-05-31, [email protected] <[email protected]> wrote:
> Dear list,
>
>
> I was surprised to learn that 'grep -i' does not
> really work for accented letters
>
> odin:~$ cat a
> křížala
> kŘíŽala
> odin:~$ grep -i ž a
> křížala
> odin:~$ grep -i Ž a
> kŘíŽala
>
> As I had LC_COLLATE="C", I tried also with this
> set to en_US.UTF-8, but to no avail.
>
> Does grep -i only work for ascii letters?
yes, that's expected.
OpenBSD base doesn't support LC_COLLATE.
$ man -k ANY=LC_COLLATE
locale(1) - character encoding and localization conventions
glob, globfree(3) - generate pathnames matching a pattern
setlocale(3) - select character encoding
strcoll, strcoll_l(3) - compare strings according to current collation
strxfrm, strxfrm_l(3) - transform a string under locale
wcscoll, wcscoll_l(3) - compare wide strings according to the current collation
wcsxfrm, wcsxfrm_l(3) - transform a wide string under locale
$ man locale
LOCALE(1) General Commands Manual LOCALE(1)
NAME
locale – character encoding and localization conventions
SYNOPSIS
locale [-a | -m | charmap]
[...]
A locale is a set of environment variables telling programs which
character encoding, language and cultural conventions the user
prefers. Programs in the OpenBSD base system ignore the locale except
for the character encoding, and it is not recommended to use any of
these variables except that the following non-default setting is
supported as an option:
export LC_CTYPE=en_US.UTF-8
Programs installed from packages(7) may or may not change behavior
according to the locale. Many programs use the X/Open System
Interfaces naming scheme for the contents of the variables listed
below, which is language[_TERRITORY][.encoding][@modifier]
[...]
> Is there a general way to achive 'true' case
> insensitive match (other than list all possibly
> present accented letters in both forms, i.e.,
> as [žŽ] in my case?
ggrep does in this instance, but I don't know how reliable that is.
--
Please keep replies on the mailing list.