On Tue, 30 Nov 2010 18:34:11 -0500, Lars T. Kyllingstad <[email protected]> wrote:

On Tue, 30 Nov 2010 13:52:20 -0500, Steven Schveighoffer wrote:

On Tue, 30 Nov 2010 13:34:50 -0500, Jonathan M Davis
<[email protected]> wrote:

[...]

4. Indexing is no longer O(1), which violates the guarantees of the
index operator.

Indexing is still O(1).

5. Slicing (other than a full slice) is no longer O(1), which violates
the
guarantees of the slicing operator.

Slicing is still O(1).

[...]

It feels extremely weird that the indices refer to code units and not
code points.  If I write

  auto str = mystring("hæ?");
  writeln(str[1], " ", str[2]);

I expect it to print "æ ?", not "æ æ" like it does now.

I don't think it's possible to do that with any implementation without making indexing not O(1). This just isn't possible, unless you want to use dchar[].

But your point is well taken. I think what I'm going to do is throw an exception when accessing an invalid index. While also surprising, it doesn't result in "extra data". I feel it's probably very rare to just access hard-coded indexes like that unless you are sure of the data in the string. Or to use a for-loop to access characters, etc.

On a side note:  It seems to me that the only reason to have char, wchar,
and dchar as separate types in the language is that arrays of said types
are UTF-encoded strings.  If a type such as the proposed one were to
become the default string type in D, it might as well wrap an array of
ubyte/ushort/uint, since direct user manipulation of the underlying array
will generally only happen in the rare cases when one wants to deal
directly with code units.

I'd still want a char[] array for easy manipulation and eventual printing. Wrapping a ubyte[] with a string just to print would be strange. My goal in this exercise is to try and give control of what to deal with (code-points vs. code-units) back to the user. Right now, the library forces you to view them as code-points, and the compiler forces you to view them as code-units (except via foreach).

But it is an interesting idea (removing small char types).

-Steve

Reply via email to