On Tue, Mar 9, 2010 at 6:04 PM, Aleksi Nurmi <[email protected]> wrote: > > 2010/3/10 Jonathan S. Shapiro <[email protected]>: > > Do people think that is a sensible position? > > Honestly, I don't see a lot of arguments in favor of the 16-bit char, > there. :-) There's the interop thing, and well... a 16-bit char has no > other use: it doesn't represent anything meaningful, it's just a > uint16. To satisfy interop requirements, adding a separate type for > 16-bit code units seems by far the most sensible thing to do, and I > don't see any real downsides. Interoperation between BitC and CTS > isn't going to be straightforward in any case.
Actually, that was my initial reaction, but it does have the consequence that it pushes me into rebuilding the text library early. That's something we need to do, but it would be nice to do it incrementally. > Additionally, IMO Kevin is right and the main string type shouldn't > prefer any particular way of indexing or iterating: graphemes, code > points and UTF-8 bytes are all equally important views... I agree, with one caveat: we should attempt to make sure that the BitC string library always produces well-formed strings in order to ensure that all of these views are meaningful. > Applications > in particular typically have no reason to use code points (nor code > units!) at all; even input is better handled as strings. In fact, > other than for implementing unicode-aware lexers, I don't know where > code points are useful. Applications are more or less forced to touch code points at sorting time, and the unicode handling library of course needs to deal with them. Other than those I tend to agree with you. shap _______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
