Re: [bitc-dev] BitC 0.20: Unicode

Jonathan S. Shapiro Tue, 09 Mar 2010 21:12:11 -0800

On Tue, Mar 9, 2010 at 6:04 PM, Aleksi Nurmi <[email protected]> wrote:
>
> 2010/3/10 Jonathan S. Shapiro <[email protected]>:
> > Do people think that is a sensible position?
>
> Honestly, I don't see a lot of arguments in favor of the 16-bit char,
> there. :-) There's the interop thing, and well... a 16-bit char has no
> other use: it doesn't represent anything meaningful, it's just a
> uint16. To satisfy interop requirements, adding a separate type for
> 16-bit code units seems by far the most sensible thing to do, and I
> don't see any real downsides. Interoperation between BitC and CTS
> isn't going to be straightforward in any case.



Actually, that was my initial reaction, but it does have the
consequence that it pushes me into rebuilding the text library early.
That's something we need to do, but it would be nice to do it
incrementally.

> Additionally, IMO Kevin is right and the main string type shouldn't
> prefer any particular way of indexing or iterating: graphemes, code
> points and UTF-8 bytes are all equally important views...

I agree, with one caveat: we should attempt to make sure that the BitC
string library always produces well-formed strings in order to ensure
that all of these views are meaningful.

> Applications
> in particular typically have no reason to use code points (nor code
> units!) at all; even input is better handled as strings. In fact,
> other than for implementing unicode-aware lexers, I don't know where
> code points are useful.

Applications are more or less forced to touch code points at sorting
time, and the unicode handling library of course needs to deal with
them. Other than those I tend to agree with you.

shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] BitC 0.20: Unicode

Reply via email to