details: https://hg.nginx.org/nginx/rev/a10210a45c8b branches: changeset: 8142:a10210a45c8b user: Yugo Horie <u5.ho...@gmail.com> date: Thu Feb 23 08:09:50 2023 +0900 description: Core: stricter UTF-8 handling in ngx_utf8_decode().
An UTF-8 octet sequence cannot start with a 11111xxx byte (above 0xf8), see https://datatracker.ietf.org/doc/html/rfc3629#section-3. Previously, such bytes were accepted by ngx_utf8_decode() and misinterpreted as 11110xxx bytes (as in a 4-byte sequence). While unlikely, this can potentially cause issues. Fix is to explicitly reject such bytes in ngx_utf8_decode(). diffstat: src/core/ngx_string.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diffs (17 lines): diff -r 2acb00b9b5ff -r a10210a45c8b src/core/ngx_string.c --- a/src/core/ngx_string.c Thu Feb 23 20:50:03 2023 +0300 +++ b/src/core/ngx_string.c Thu Feb 23 08:09:50 2023 +0900 @@ -1364,7 +1364,12 @@ ngx_utf8_decode(u_char **p, size_t n) u = **p; - if (u >= 0xf0) { + if (u >= 0xf8) { + + (*p)++; + return 0xffffffff; + + } else if (u >= 0xf0) { u &= 0x07; valid = 0xffff; _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel