On Wed, 2013-06-05 at 22:57 +0200, Simon Josefsson wrote: > It is not trivial, and there may be multiple reasonable implementations. > I have been meaning to write up one way to do it, and to implement that, > in the hope that it could be established as a standard, but haven't > found time. I recall sending a short summary of the steps required to > the IDNA list (I think) a long time ago when I noticed this issue with > IDNA2008.
I see... > > Libidn2 doesn't seem to supply such a function yet, the > > older Libidn (at least the cmd line tool) doesn't either > > really, but I can manually split the punycode part from > > the xn-- in each label and then use Libidn's punycode decoder > > to reach my goal. Seems a bit of a hassle though. > > Yup, something like this is what a library could implement. There are > aspects which is unclear (for example, how to split the domain? On > ASCII dot '.' only, or the IDNA2003 domain separators? Should you split > on escaped dots?). Hmm, just noticed that the idnkit2.2 guys actually have implemented their own interpretation of reverse conversion now, here's some of what they do: python t.py | /usr/local/bin/idnconv2 -reverse www.buße.de www․buße․de www‥buße‥de www…buße…de www⒈buße⒈de www⒉buße⒉de www⒊buße⒊de www⒋buße⒋de www⒌buße⒌de www⒍buße⒍de www⒎buße⒎de www⒏buße⒏de www⒐buße⒐de www⒑buße⒑de www⒒buße⒒de www⒓buße⒓de www⒔buße⒔de www⒕buße⒕de www⒖buße⒖de www⒗buße⒗de www⒘buße⒘de www⒙buße⒙de www⒚buße⒚de www⒛buße⒛de www㏂buße㏂de www㏇buße㏇de www㏘buße㏘de www︙buße︙de www︰buße︰de www﹒buße﹒de www.buße.de www🄀buße🄀de t.py: for l in file('lst').readlines(): if not l.startswith('U+'): continue ustr = l.split()[0].split('+')[1] u = unichr(int(ustr, 16)) print (u'www%sxn--bue-6ka%sde' % (u,u)).encode('utf-8') 'lst' contains a text/cutnpaste from http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:toNFKC=/\./:] They don't interpret %2E however: echo "www%2Exn--bue-6ka%2Ede" | /usr/local/bin/idnconv2 -reverse www%2exn--bue-6ka%2ede but to be honest, I don't really understand the intrinsics of IDNA2003/2008 and the whole unicode character transformation and classification rules, that's why I am happy to use your libraries whenever possible ;=) Regards, Thomas _______________________________________________ Help-libidn mailing list Help-libidn@gnu.org https://lists.gnu.org/mailman/listinfo/help-libidn