On 10 March 2010 23:58, Jonathan S. Shapiro <[email protected]> wrote:
> On Wed, Mar 10, 2010 at 2:05 PM, Eric Northup <[email protected]>
> wrote:
>> Given that the notion of "char" turns out to be a bit confused, why not
>> skip it in BitC?  That is: don't have a "char" type at all.
>
> Yeah. Most of the unicode-supporting languages are explicit that "char" no
> longer means "character" in the human sense. This is one of those cases
> where you're going to have confusion no matter what, so the question comes
> down to whether it's better to introduce *another* confusion.
>
>> CodePoint is a good type name for the full-word thing, and CodeUnit
>> seems as good a name as any for what CLI calls [MSCorlib]System.Char...
>> or perhaps UTF16Unit?
>
> Or UCS-2, which is precise.
>
> So I think you are proposing the following position:
>
> BitC string has unspecified representation. In a CLR implementation it will
> probably be implemented using System.String, but other representations can
> be considered.
>
> Conversion from BitC.String to System.String is therefore "free".
> Conversion from System.String to BitC.String is representation-preserving,
> but requires validation
>
> System.Char is typed in BitC as "BitC.UCS2".
> System.String is typed in BitC as "BitC.UCS2 Vector".

Does this amend the contract to provide these types on any platform?

Would the contract be amended for any other platform-specific string
representation?

Beware of the Chinese guy who comes and says in a polite manner that
BitC sucks because it can only do Unicode and he needs additional
characters for his Ancient Chinese texts (for which there is
non-Unicode encoding).

Note also the recent JIS amendment that assigns single codepoint to
characters that have to be represented by multiple codepoints
(character with modifier(s)) in Unicode. Converting from this encoding
to Unicode does give the same characters but loses the information
whether they were decomposed or not.

> BitC.Char, if present, is a type alias for BitC.UCS4, a.k.a Unicode Code
> Points.

I guess we can avoid a Char altogether since it is just a confusing alias.

>
> Is that it?
>
> I think that this is one consistent position. The other consistent position
> would be that "BitC.char" is a type alias for BitC.UCS2.

What would be this consistent with?


Thanks

Michal
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to