Re: [bitc-dev] BitC 0.20: Unicode

Jonathan S. Shapiro Wed, 10 Mar 2010 13:25:02 -0800

On Wed, Mar 10, 2010 at 12:37 AM, Aleksi Nurmi <[email protected]> wrote:
> ...it would make sense to implement BitC's string type in UTF-16,
> implemented by the string type of the runtime. UTF-16 strings are also
> more compact, but lose the constant time random access to code points.


No! It would make sense to specify the BitC string type in such a way
that the internal representation of string is not constrained, and
provide *accessors* that mate with the various preferred encodings.
This would allow us to use the native string representation of the
platform without being married to a particular representation choice.

For example: if compactness is a goal, we should be going after UTF-8,
not UTF-16.

> Using an UTF-16 string type doesn't mean that the char type should be
> a 16-bit code unit. For reasons already stated, a string type that
> always contains valid unicode doesn't really require access to the
> underlying code units.

I agree. The problem is that the semantics of s[i] in CLR/JVM returns
code units, and the CLR/JVM *mislabels* those units "char". The
question might be re-framed as: how often do fields and paramters of
type "char" appear at CLI/JVM interop boundaries?

> I suppose that since representation matters, the representation of the
> string type cannot be implementation-dependent even if the string type
> doesn't provide direct access to its representation.

In this case I think that it can, though it may be worth a longer
discussion about why I think so.

shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] BitC 0.20: Unicode

Reply via email to