On Jan 24, 2012, at 11:45 PM, Norbert Lindenberg wrote:

> I don't see the standard allowing character encodings other than UTF-16 in 
> strings. Section 8.4 says "When a String contains actual textual data, each 
> element is considered to be a single UTF-16 code unit." This aligns with 
> other normative references to UTF-16 in sections 2, 6, and 15.1.3. Section 
> 8.4 does seem to allow the use of strings for non-textual data, but character 
> encodings are by definition for characters, i.e., textual data.

8.4 definitely allows for non-textual data" "String type is ... sequences of 
... 16-bit unsigned integer values...", "The String type is generally used to 
represent textual data...", "All operations on Strings ... treat them as 
sequence of undifferentiated 16-bit signed integers..."

Arbitrary 16-bit values can be placed in a String using either 
String.fromCharCode (15.5.3.2) or the \uxxxx notation in string literals.  
Neither of these enforce a requirement that individual String elements are 
valid Unicode code units.

The standard always encodes strings expressed as string literals (except  for 
literal containing \u escapes) using Unicode.  However such literals are 
restricted to containing characters in the BCP so all such characters are 
encoded as single 16-bit String elements. 

The functions in 15.1.3 do UTF-8 encoding/decoding but only if the the actual 
string arguments contain well formed UTF data.  They explicitly throw when 
encountering other data.  This is a characteristic of these specific functions, 
not of strings in general.
> 
> Using a Unicode escape for non-textual data seems like abuse to me - Unicode 
> is a character encoding standard. For Unicode, anything beyond six hex digits 
> is excessive.

I see no intent in the spec. that \u or String.fromCharCode was to be 
restricted to valid Unicode character encodings.

Any character encoding is simply a semantic interpretation of binary values.  
There is no particular reason that "text" encode using non-Unicode encodings 
(say, for example EBCDIC) can be presented using ES String values and most of 
the String methods would work fine with such textual data.  You would probably 
want to do exactly that, if you were writing code that had to deal with 
character set conversions.


Allen


_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to