On Wed, 01 Dec 2010 03:30:07 -0500, foobar <[email protected]> wrote:
Steven Schveighoffer Wrote:
[snipped]
> 3. You have no access to the underlying array unless you're dealing
with
> an
> actual array of dchar.
I thought of adding some kind of access. I wasn't sure the best way.
I was thinking of allowing direct access via opCast, because I think
casting might be a sufficient red flag to let you know you are crossing
into dangerous waters.
But it could just be as easy as making the array itself public.
-Steve
A string type should always maintain the invariant that it is a valid
unicode string. Therefore I don't like having an unsafe opCast or
providing direct access to the underlying array. I feel that there
should be a read-only property for that. Algorithms that manipulate
char[]'s should construct a new string instance which will validate the
char[] it is being built from is a valid utf string.
Copying is not a good idea, nor is runtime validation. We can only
protect the programmer so much.
The good news is that the vast majority of strings are literals, which
should be properly constructed by the compiler, and immutable.
This looks like a great start for a proper string type. There's still
the issue of literals that would require compiler/language changes.
That is essential, the compiler has to defer the type of string literals
to the library somehow.
There's one other issue that should be considered at some stage:
normalization and the fact that a single "character" can be constructed
from several code points. (acutes and such)
This is more solvable with a struct, but at this point, I'm not sure if
it's worth worrying about. How common is that need?
-Steve