Mark Davis 🚙 wrote: >> I suspect the punycode goal is to take a wide character set into a >> restricted character set, without caring much on resulting string >> length; if the original string happens to be in other character set >> than the target restricted character set, then the string length >> increases too much to be of interest in the SMS discussion. > > That is not correct. One of the chief reasons that punycode was > selected was the reduction in size.
But certainly the main motivation behind the development of Punycode, or any of the ACEs (ASCII-Compatible Encodings) that came before it, was to provide a compact encoding given the constraints of the set of characters allowed in domain names. The extensibility of the algorithm to target character sets of different sizes was definitely an advantage. > Tests with the idnbrowser is not relevant. As I said: > >> In that form, it uses a smaller number of >> bytes per character, but a parameterization allows use of all byte >> values. > > That is, the parameterization of punycode for IDNA is restricted to > the 36 IDNA values per byte, thus roughly 5 bits. When you > parameterize punycode for a full 8 bits per byte, you get considerably > different results. Not to say this isn’t so, but can you point to a tool or site where a user can type a string and see the output with different parameterizations? Pretty much all of the “Convert to Punycode” pages I see are only able to convert to the IDNA target. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell