On Thu, May 23, 2024 at 10:25 AM Chet Ramey <chet.ra...@case.edu> wrote: > > On 5/21/24 2:42 PM, Grisha Levit wrote: > > Avoid using (size_t)-1 as an offset. > > I can't reproduce this on macOS. Where is the code that's using -1 as an > offset?
The loop in rl_change_case does the following: rl_change_case(count=-1, op=2) at text.c:1483:9 1481 while (start < end) 1482 { -> 1483 c = _rl_char_value (rl_line_buffer, start); _rl_char_value(buf="\xc0", ind=0) at mbutil.c:493:23 491 l = strlen (buf); 492 if (ind + 1 >= l) -> 493 return ((WCHAR_T) buf[ind]); (wchar_t) c = L'À' This seems questionable since a string consisting of \xC0, and a string actually representing \u00C0 (\xC3\x80) will both return the same thing. The next check passes, since C is LATIN CAPITAL LETTER A WITH GRAVE rl_change_case(count=-1, op=2) at text.c:1487:28 -> 1487 if (_rl_walphabetic (c) == 0) 1488 { 1489 inword = 0; 1490 start = next; 1450 continue; _rl_walphabetic(wc=L'À') at util.c:89:5 88 if (iswalnum (wc)) -> 89 return (1); So we call mbrtowc on the same string position and since this is not a valid multibyte character, (size_t)-1 is stored in M. rl_change_case(count=-1, op=2) at text.c:1512:22 -> 1512 m = MBRTOWC (&wc, rl_line_buffer + start, end - start, &mps); (size_t) m = 18446744073709551615 Then we again interpret \xC0 as if it were \u00C0: rl_change_case(count=-1, op=2) at text.c:1514:20 1513 if (MB_INVALIDCH (m)) -> 1514 wc = (WCHAR_T)rl_line_buffer[start]; (wchar_t) wc = L'À' And lowercase that character, storing its length in MLEN. rl_change_case(count=-1, op=2) at text.c:1517:11 -> 1517 nwc = (nop == UpCase) ? _rl_to_wupper (wc) : _rl_to_wlower (wc); rl_change_case(count=-1, op=2) at text.c:1524:28 -> 1524 mlen = WCRTOMB (mb, nwc, &ts); (wchar_t) nwc = L'à' (int) mlen = 2 Since WC and NWC are different, and M (being (size_t)-1) is greater than MLEN: rl_change_case(count=-1, op=2) at text.c:1544:13 1541 else if (m > mlen) 1542 { 1543 memcpy (s, mb, mlen); -> 1544 memmove (s + mlen, s + m, (e - s) - m); So the second arg to memmove is a pointer one behind S.