Re: [bitc-dev] Unicode and bitc

Jonathan S. Shapiro Wed, 13 Oct 2010 15:26:14 -0700

On Wed, Oct 13, 2010 at 3:48 AM, Ben Kloosterman <[email protected]> wrote:


> Corrections
>
> 1)
>
> I stated .NET uses UCS-2 but it uses UTF-16 ( never realized all those
> indexes would take O(n) to find the position)
>

Actually not. It uses the same misbegotten encoding that Java uses - I don't
recall the name.

Note that the .NET "char" is defined as 16 bits. Extended code points can *
only* be encoded using strings, and .NET string indexing is defined to
operate w.r.t. 16-bit units (and therefore isn't robust on extended
characters). What happens in practice is that people carefully segregate
strings that might contain extended characters.


> 2) I was also under the impression that BitC offered C style mutable
> strings. So when I suggested removing index from string and convert to
> array
> that was what I meant.
>

BitC does not provide mutable strings.


> UCS-2  which offers O(1) indexing and finds but cant represent most Asian
> chars requiring non standard encoding upon the internal string
> representation and takes 2 bytes storage per character.
> UTF-8 With O(n) indexing  which allows the developer to refer to the
> character. Note on x86 you can use a fast SSE2 0x10 bit pattern scan to
> count characters quicker.
> UTF-8  with O(1) byte indexing with more runtime method focus and
> ToFixedCharArray methods for char indexing.
>

Good list, but incomplete. Ropes with O(log(n)) indexing work just find in
practice.

Also worth noting that indexing is almost *never* random. The get
next/previous operation speed is much more important that finding the
initial location.

shap

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Unicode and bitc

Reply via email to