Re: Question about the “full Unicode in strings” strawman

Allen Wirfs-Brock Tue, 24 Jan 2012 21:37:28 -0800

On Jan 24, 2012, at 2:11 PM, Mark S. Miller wrote:

> On Tue, Jan 24, 2012 at 12:33 PM, Allen Wirfs-Brock <[email protected]> 
> wrote:
> Note that this proposal isn't currently under consideration for inclusion in 
> ES.next, but the answer to you question is below
> [...] 
> Just as the current definition of string specifies that a String is a 
> sequence of 16-bit unsigned integer values, the proposal would specify that a 
> String is a sequence of 32-bit unsigned integer values.  In neither cause is 
> it required that the individual String elements must be valid Unicode code 
> point or code units. 8 hex digits are required to express a the full range of 
> unsigned 32-bit integers.
> 
> Why 32? Unicode has only 21 bits of significance. Since we don't expect 
> strings to be stored naively (taking up 4x the space that would otherwise be 
> allocated),
I believe most current implementation actually store 16-bits per characters so 
it would be 2x rather than 4x
>


> I don't see the payoff from choosing the next power of 2. The other choices I 
> see are a) 21 bits, b) 53 bits, or c) unbounded.

The current 16-bit character strings are sometimes uses to store non-Unicode 
binary data and can be used with non-Unicode character encoding with up to 
16-bit chars.  21 bits is sufficient for Unicode but perhaps is not enough for 
other useful encodings.  32-bit seems like a plausable unit.

The real controversy that developed over this proposal regarded whether or not 
every individual Unicode characters needs to be uniformly representable as a 
single element of a String. This proposal took the position that they should.  
Other voices felt that such uniformity was unnecessary and seem content to 
expose UTF-8 or UTF-16.  The argument was that applications may have to look at 
multiple character logical units anyway, so dealing with UTF encodings isn't 
much of an added burden. 

Allen

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Question about the “full Unicode in strings” strawman

Reply via email to