On Wed, Mar 10, 2010 at 12:37 AM, Aleksi Nurmi <[email protected]> wrote: > ...it would make sense to implement BitC's string type in UTF-16, > implemented by the string type of the runtime. UTF-16 strings are also > more compact, but lose the constant time random access to code points.
No! It would make sense to specify the BitC string type in such a way that the internal representation of string is not constrained, and provide *accessors* that mate with the various preferred encodings. This would allow us to use the native string representation of the platform without being married to a particular representation choice. For example: if compactness is a goal, we should be going after UTF-8, not UTF-16. > Using an UTF-16 string type doesn't mean that the char type should be > a 16-bit code unit. For reasons already stated, a string type that > always contains valid unicode doesn't really require access to the > underlying code units. I agree. The problem is that the semantics of s[i] in CLR/JVM returns code units, and the CLR/JVM *mislabels* those units "char". The question might be re-framed as: how often do fields and paramters of type "char" appear at CLI/JVM interop boundaries? > I suppose that since representation matters, the representation of the > string type cannot be implementation-dependent even if the string type > doesn't provide direct access to its representation. In this case I think that it can, though it may be worth a longer discussion about why I think so. shap _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
