Re: [bitc-dev] BitC Strings and Unicode

Jonathan S. Shapiro Mon, 20 Oct 2014 17:53:34 -0700

On Mon, Oct 20, 2014 at 5:17 AM, William ML Leslie <
[email protected]> wrote:


> The primary string type, with the most straightforward literal, which
> is given to you by a wide range of library functions, should probably
> be a text type.
>
> I think there is still a need for bytes literals, bytes formatting,
> and bytes String.join (.partition, .split ...).
>

Why can't those be operations involving byte arrays?


> I've used languages that didn't have a first class bytes type, and
> I've seen programmers jump from byte[] to String and back just to use
> string methods on network packets.  I recommend having a
> fully-featured bytestring type.
>

Well, we already have int8/uint8, which are byte types. And we can do
vectors and arrays of those. And if desired we can make those immutable.
Offhand I think that gives us the necessary data structure. Implementing
the operations doesn't really seem all that hard.

What is the difference, in your mind, between a byte string and a byte
vector or byte array?


>
> > 2. Does following set of rules for strings make sense? If no, why not?
> >
> > Strings are normalized via NFC
> > String operations preserve NFC encoding
> > Strings are encoded in UTF-8
> > Strings are indexed by the byte
>
> You could probably convince me of this.  In my head I want them to be
> opaque so that you can't obtain part of a character...


I'd like that too. The problem with Tim Čas's observation is that it shows
this to be unachievable. Even if you implement your strings as 32-bit code
points, there are still characters that can't be expressed in one code
point.

In a very real sense, characters don't exist in Unicode.


shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] BitC Strings and Unicode

Reply via email to