Re: [bitc-dev] BitC 0.20: Unicode

Jonathan S. Shapiro Wed, 10 Mar 2010 15:00:10 -0800

On Wed, Mar 10, 2010 at 2:05 PM, Eric Northup <[email protected]>
wrote:
> Given that the notion of "char" turns out to be a bit confused, why not
> skip it in BitC?  That is: don't have a "char" type at all.


Yeah. Most of the unicode-supporting languages are explicit that "char" no
longer means "character" in the human sense. This is one of those cases
where you're going to have confusion no matter what, so the question comes
down to whether it's better to introduce *another* confusion.

> CodePoint is a good type name for the full-word thing, and CodeUnit
> seems as good a name as any for what CLI calls [MSCorlib]System.Char...
> or perhaps UTF16Unit?

Or UCS-2, which is precise.

So I think you are proposing the following position:

   - BitC string has unspecified representation. In a CLR implementation it
   will probably be implemented using System.String, but other representations
   can be considered.
      - Conversion from BitC.String to System.String is therefore "free".
      - Conversion from System.String to BitC.String is
      representation-preserving, but requires validation
   - System.Char is typed in BitC as "BitC.UCS2".
   - System.String is typed in BitC as "BitC.UCS2 Vector".
   - BitC.Char, if present, is a type alias for BitC.UCS4, a.k.a Unicode
   Code Points.

Is that it?

I think that this is one consistent position. The other consistent position
would be that "BitC.char" is a type alias for BitC.UCS2.

So I think we are down to: which way should the BitC.char type alias go?


shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] BitC 0.20: Unicode

Reply via email to