On Mon, Oct 20, 2014 at 5:17 AM, William ML Leslie < [email protected]> wrote:
> The primary string type, with the most straightforward literal, which > is given to you by a wide range of library functions, should probably > be a text type. > > I think there is still a need for bytes literals, bytes formatting, > and bytes String.join (.partition, .split ...). > Why can't those be operations involving byte arrays? > I've used languages that didn't have a first class bytes type, and > I've seen programmers jump from byte[] to String and back just to use > string methods on network packets. I recommend having a > fully-featured bytestring type. > Well, we already have int8/uint8, which are byte types. And we can do vectors and arrays of those. And if desired we can make those immutable. Offhand I think that gives us the necessary data structure. Implementing the operations doesn't really seem all that hard. What is the difference, in your mind, between a byte string and a byte vector or byte array? > > > 2. Does following set of rules for strings make sense? If no, why not? > > > > Strings are normalized via NFC > > String operations preserve NFC encoding > > Strings are encoded in UTF-8 > > Strings are indexed by the byte > > You could probably convince me of this. In my head I want them to be > opaque so that you can't obtain part of a character... I'd like that too. The problem with Tim Čas's observation is that it shows this to be unachievable. Even if you implement your strings as 32-bit code points, there are still characters that can't be expressed in one code point. In a very real sense, characters don't exist in Unicode. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
