Re: Unicode, SMS and year 2012

Doug Ewell Sat, 28 Apr 2012 11:57:56 -0700

Mark Davis 🚙 wrote:

>> I suspect the punycode goal is to take a wide character set into a
>> restricted character set, without caring much on resulting string
>> length; if the original string happens to be in other character set
>> than the target restricted character set, then the string length
>> increases too much to be of interest in the SMS discussion.
>
> That is not correct. One of the chief reasons that punycode was
> selected was the reduction in size.


But certainly the main motivation behind the development of Punycode, or any of 
the ACEs (ASCII-Compatible Encodings) that came before it, was to provide a 
compact encoding given the constraints of the set of characters allowed in 
domain names. The extensibility of the algorithm to target character sets of 
different sizes was definitely an advantage.

> Tests with the idnbrowser is not relevant. As I said: 
>
>> In that form, it uses a smaller number of
>> bytes per character, but a parameterization allows use of all byte
>> values.
>
> That is, the parameterization of punycode for IDNA is restricted to
> the 36 IDNA values per byte, thus roughly 5 bits. When you
> parameterize punycode for a full 8 bits per byte, you get considerably
> different results.

Not to say this isn’t so, but can you point to a tool or site where a user can 
type a string and see the output with different parameterizations? Pretty much 
all of the “Convert to Punycode” pages I see are only able to convert to the 
IDNA target.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

Re: Unicode, SMS and year 2012

Reply via email to