Re: bash-4.3: casemod word expansions broken with UTF-8

isabella parakiss Mon, 16 Nov 2015 17:10:45 -0800

On 11/15/15, Ulrich Mueller <u...@gentoo.org> wrote:
> Description:
>       In an UTF-8 locale like en_US.UTF-8, the case-modifying
>       parameter expansions sometimes return invalid UTF-8 encodings.
>
>       This seems to happen when the UTF-8 byte sequences that are
>       encoding upper and lower case have different lengths.
>
> Repeat-By:
>       $ LC_ALL=en_US.UTF-8
>       $ x=$'\xc4\xb1' # LATIN SMALL LETTER DOTLESS I
>       $ echo -n "${x^}" | od -t x1
>       0000000 49 b1
>       0000002
>
>       This should have output "49" for "I" only. The "b1" is illegal
>       as the first byte of an UTF-8 sequence.
>
>       $ x=$'\xe1\xba\x9e' # LATIN CAPITAL LETTER SHARP S
>       $ echo -n "${x,}" | od -t x1
>       0000000 c3 9f 9e
>       0000003
>
>       This should have output "c3 9f" (for "sharp s") only.
>


Both examples should work as expected in 4.4-beta.


---
xoxo iza

Re: bash-4.3: casemod word expansions broken with UTF-8

Reply via email to