Hi, sorry for bothering you. It looks good to me. Thanks! — Yugo Horie
On Thu, Mar 2, 2023 at 8:51 Maxim Dounin <mdou...@mdounin.ru> wrote: > Hello! > > On Thu, Feb 23, 2023 at 09:24:52AM +0900, u5h wrote: > > > Thanks reviewing! > > > > I agree with your early return strategy and I would reconsider that > > condition below. > > > > # HG changeset patch > > # User Yugo Horie <u5.ho...@gmail.com> > > # Date 1677107390 -32400 > > # Thu Feb 23 08:09:50 2023 +0900 > > # Node ID a3ca45d39fcfd32ca92a6bd25ec18b6359b90f1a > > # Parent f4653576ffcd286bed7229e18ee30ec3c713b4de > > Core: restrict the rule of utf-8 decode. > > > > The first byte being above 0xf8 which is referred to 5byte > > over length older utf-8 becomes invalid. > > Even the range of the first byte from 0xf5 to > > 0xf7 is valid in the term of the codepoint decoding. > > See https://datatracker.ietf.org/doc/html/rfc3629#section-4. > > > > diff -r f4653576ffcd -r a3ca45d39fcf src/core/ngx_string.c > > --- a/src/core/ngx_string.c Thu Feb 23 07:56:44 2023 +0900 > > +++ b/src/core/ngx_string.c Thu Feb 23 08:09:50 2023 +0900 > > @@ -1363,8 +1363,12 @@ > > uint32_t u, i, valid; > > > > u = **p; > > - > > - if (u >= 0xf0) { > > + if (u >= 0xf8) { > > + > > + (*p)++; > > + return 0xffffffff; > > + > > + } else if (u >= 0xf0) { > > > > u &= 0x07; > > valid = 0xffff; > > Slightly adjusted the commit log to better explain the issue (and > restored the accidentally removed empty line). Please take a look > if it seems good enough: > > # HG changeset patch > # User Yugo Horie <u5.ho...@gmail.com> > # Date 1677107390 -32400 > # Thu Feb 23 08:09:50 2023 +0900 > # Node ID a10210a45c8b6e6bb75e98b2fd64a80c184ae247 > # Parent 2acb00b9b5fff8a97523b659af4377fc605abe6e > Core: stricter UTF-8 handling in ngx_utf8_decode(). > > An UTF-8 octet sequence cannot start with a 11111xxx byte (above 0xf8), > see https://datatracker.ietf.org/doc/html/rfc3629#section-3. Previously, > such bytes were accepted by ngx_utf8_decode() and misinterpreted as > 11110xxx > bytes (as in a 4-byte sequence). While unlikely, this can potentially > cause > issues. > > Fix is to explicitly reject such bytes in ngx_utf8_decode(). > > diff --git a/src/core/ngx_string.c b/src/core/ngx_string.c > --- a/src/core/ngx_string.c > +++ b/src/core/ngx_string.c > @@ -1364,7 +1364,12 @@ ngx_utf8_decode(u_char **p, size_t n) > > u = **p; > > - if (u >= 0xf0) { > + if (u >= 0xf8) { > + > + (*p)++; > + return 0xffffffff; > + > + } else if (u >= 0xf0) { > > u &= 0x07; > valid = 0xffff; > > > -- > Maxim Dounin > http://mdounin.ru/ > _______________________________________________ > nginx-devel mailing list > nginx-devel@nginx.org > https://mailman.nginx.org/mailman/listinfo/nginx-devel >
_______________________________________________ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel