On Tue, May 19, 2026 at 10:18 PM Timofei Zhakov <[email protected]> wrote: > > On Tue, May 19, 2026 at 9:42 PM Evgeny Kotkov > <[email protected]> wrote: > > > > Timofei Zhakov <[email protected]> writes: > > > > > Committed the test in r1934405 and the rest of the patch in r1934407. > > > Removed svn_utf__cstring_is_valid() check in r1934408. > > > > Thanks for the change! I like the idea, and the implementation looks > > good to me overall. > > > > I'm a bit late to the party, but here are a few minor comments and > > suggestions regarding the code: > > > > + /* Convert the UTF-8 string to UTF-32 (UCS4) which is the format > > + * utf8proc_charwidth() expects, and get the width of each character. > > + * We don't need much error checking since the input is valid UTF-8. */ > > + while (*cstr) > > > > After r1934408, the comment stating that "the input is valid UTF-8" seems > > outdated, because we no longer check for valid UTF-8 in advance. > > Oh, this's a good catch, thanks. I'll fix the comment.
Committed in r1934427. > > > + int nbytes = utf8proc_iterate((apr_byte_t*)cstr, -1, &ucs); > > > > Since `cstr` is a `const char *`, I think this cast unnecessarily discards > > constness. > > > > Also, because utf8proc_iterate() expects a `const utf8proc_uint8_t *`, > > I'd say that casting directly to that target type is safer than using > > `apr_byte_t *`, because the latter could theoretically denote a > > different underlying type. Committed in r1934429. > > > > A couple of similar observations: > > > > - utf8proc_iterate() returns a utf8proc_ssize_t. Assigning this to > > `int nbytes` risks truncation in environments where int is smaller > > than utf8proc_ssize_t: > > > > + int nbytes = utf8proc_iterate((apr_byte_t*)cstr, -1, &ucs); > > Well technically it only returns 1,2,3,4, or -3, but I agree that it should be better to use this type anyway. Committed in r1934430. > > - utf8proc_iterate() puts the codepoint result in a utf8proc_int32_t, > > whereas we currently type it as an apr_int32_t: Committed in r1934428. > > > > + apr_int32_t ucs; > > > > While those types are almost certainly compatible, I think that changing > > it to `utf8proc_int32_t` would guarantee that we use a correct type. > > It would also ensure that `utf8proc_charwidth(utf8proc_int32_t c)` gets > > called with an argument of the exactly expected type. > > I'm pretty sure I copied this code from another function in the same > file, so I guess this is an API misuse that needs to be fixed in all > those places (svn_utf__fuzzy_escape is the only other place). -- Timofei Zhakov

