Jon Nelson <jnel...@jamponi.net> writes:

> On Thu, Aug 2, 2012 at 3:21 PM, Simon Josefsson <si...@josefsson.org> wrote:
>> Jon Nelson <jnel...@jamponi.net> writes:
>>
>>> I've encountered two bugs or misfeatures in libidn:
>>
>> Hi!  Thanks for your report.
>>
>>> 1. given an idna-encoded input, it is possible to generate invalid
>>> UTF-8 output (as defined by RFC3629). The UTF-8 is invalid because
>>> codepoints above 0x10FFFF are used.
>>>
>>> See http://tools.ietf.org/html/rfc3629
>>
>> Can you be more concrete, what inputs does this happen for and what
>> output would you expect?  An example would help illustrate the problem.
>
> Example:   echo xn--1234xxxxxxxxxx | idn -u --debug

Thank you.  Interestingly, the punycode code from RFC 3492 happily
decodes the string to Unicode code points > U+10FFFF.  I can't see
anything in RFC 3492 (punycode) or RFC 3490 (IDNA ToUnicode) that
requires checking for code points > U+10FFFF, or where that check would
be done.  Arguable, the final conversion from UCS4 to UTF8 should
trigger an error in libidn, but then the damage is already done:
ToUnicode has returned a sequence of code points which are illegal.  So,
it seems ToUnicode should perform this check somewhere, but I can't find
where it would be suitable reading RFC 3492 and RFC 3490.  Thoughts?

/Simon

_______________________________________________
Help-libidn mailing list
Help-libidn@gnu.org
https://lists.gnu.org/mailman/listinfo/help-libidn

Reply via email to