On Wed, Mar 10, 2010 at 1:37 PM, Eric Rannaud <[email protected]> wrote:
> Are strings arrays of char and is that exposed by the API?

I wish the question were that simple. The answers are "maybe", and "yes and no".

In CLR/JVM, the specification takes no position on the underlying
representation of strings. It is a feasible implementation to use
UTF-8, UTF-16, or UTF-32 encoding. It is a feasible implementation to
make up a completely new encoding.

In CLR/JVM, the default string indexing operator indexes in terms of
UCS-2 code units. Because of this, the most natural implementation of
strings in these systems is "vector of UCS2 code units", and this is
the implementation that is commonly used.

However, there is no requirement in either runtime that string
indexing be a unit-time operation. Given a UTF-8 string
representation, it would be a correct implementation if s[i] had to
walk the string to determine and extract the indexed code unit.

This is why I keep saying that we can simultaneously have accessors in
terms of code units (for compatibility) and in terms of code points.
The main reason this works is because strings are immutable. Pulling
it off merely requires that all strings must be well formed.

> If not, then just make char as large as ever necessary (uint32?). The
> only way to get a char from a string is with things like:
>    char c = string.getCharacter(i)
>
> If the language does not expose the string representation in any way,
> why would you not make char the largest you ever need?

This is indeed the direction I am leaning. Further, I am leaning in
the direction of declaring that s[i] *in BitC* returns a 32-bit code
point rather than a 16-bit code unit. This is self-consistent, but it
deviates from JVM/CLR System.String indexing behavior.

But once you make this decision, a question of interoperability
arises. Suppose we import a module that is written in C#, and this
module publishes an object:

    struct Example {
        char c;   // note C# char, a.k.a. CLR System.Char, or UCS-2
        int i32;
    };

We need to be able to represent this type in BitC. What BitC type
should be given to this field?


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to