On Sat, Jun 29, 2013 at 07:13:40PM -0700, Junio C Hamano wrote: > "brian m. carlson" <sand...@crustytoothpaste.net> writes: > Does this correspond to the following comment in the same file, and > if so, shouldn't this part of your patch?
Yes, yes, it should. > We used to allow te original up-to-6 form and this update is about > rejecting anything above U+10FFFF (in line with e.g. RFC 3629)? Correct. Since no codepoints above U+10FFFF will ever be assigned, there's no point in trying to allow those UTF-8 sequences, since they will be caught below anyway. Some four-byte sequences will produce too-large codepoints, but all five- and six-byte sequences are guaranteed to. > As you use separate if statements for this check, perhaps you can > give each of them a better comment to say what each is rejecting? > E.g. > > /* do not allow range for surrogate pair */ > if (codepoint >= 0xd800 && codepoint <= 0xdfff) > return bad_offset; Sure. > As that comment I quoted at the beginning says, we did not check for > invalid encoded values and the primary reason for it is beccause > this function did not want to look into the actual values here. But > now you are looking into "codepoint", you can now also check for > "overlong" form (e.g. sequence "C0 80" turning into U+0000)? Correct. That's what my second patch does. I thought I would make the changes separately, since they're slightly different and from what I've seen git development prefers small, independent patches, but if you'd prefer, I can squash them into one patch. > Perhaps > > test_expect_success 'test name' ' > test_when_finished "rm -f $HOME/stderr" && I wasn't aware that existed, but it makes sense to use it. > > +Invalid surrogate:??? > > I suspect that I did not receive what you intended to send. Do you > want to send this part as a binary patch perhaps? If you ended up with an encoding of U+D800, then you got it. Otherwise, I can resend it as a binary patch during the reroll. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187
Description: Digital signature