Jon Nelson <jnel...@jamponi.net> writes: > On Thu, Aug 2, 2012 at 3:21 PM, Simon Josefsson <si...@josefsson.org> wrote: >> Jon Nelson <jnel...@jamponi.net> writes: >> >>> I've encountered two bugs or misfeatures in libidn: >> >> Hi! Thanks for your report. >> >>> 1. given an idna-encoded input, it is possible to generate invalid >>> UTF-8 output (as defined by RFC3629). The UTF-8 is invalid because >>> codepoints above 0x10FFFF are used. >>> >>> See http://tools.ietf.org/html/rfc3629 >> >> Can you be more concrete, what inputs does this happen for and what >> output would you expect? An example would help illustrate the problem. > > Example: echo xn--1234xxxxxxxxxx | idn -u --debug
Thank you. Interestingly, the punycode code from RFC 3492 happily decodes the string to Unicode code points > U+10FFFF. I can't see anything in RFC 3492 (punycode) or RFC 3490 (IDNA ToUnicode) that requires checking for code points > U+10FFFF, or where that check would be done. Arguable, the final conversion from UCS4 to UTF8 should trigger an error in libidn, but then the damage is already done: ToUnicode has returned a sequence of code points which are illegal. So, it seems ToUnicode should perform this check somewhere, but I can't find where it would be suitable reading RFC 3492 and RFC 3490. Thoughts? /Simon _______________________________________________ Help-libidn mailing list Help-libidn@gnu.org https://lists.gnu.org/mailman/listinfo/help-libidn