Re: [review] new string type

Steven Schveighoffer Wed, 01 Dec 2010 13:45:35 -0800

On Tue, 30 Nov 2010 18:34:11 -0500, Lars T. Kyllingstad<[email protected]> wrote:

On Tue, 30 Nov 2010 13:52:20 -0500, Steven Schveighoffer wrote:

On Tue, 30 Nov 2010 13:34:50 -0500, Jonathan M Davis
<[email protected]> wrote:

[...]

4. Indexing is no longer O(1), which violates the guarantees of the
index operator.


Indexing is still O(1).

5. Slicing (other than a full slice) is no longer O(1), which violates
the
guarantees of the slicing operator.


Slicing is still O(1).

[...]


It feels extremely weird that the indices refer to code units and not
code points.  If I write

  auto str = mystring("hæ?");
  writeln(str[1], " ", str[2]);

I expect it to print "æ ?", not "æ æ" like it does now.

I don't think it's possible to do that with any implementation withoutmaking indexing not O(1). This just isn't possible, unless you want touse dchar[].

But your point is well taken. I think what I'm going to do is throw anexception when accessing an invalid index. While also surprising, itdoesn't result in "extra data". I feel it's probably very rare to justaccess hard-coded indexes like that unless you are sure of the data in thestring. Or to use a for-loop to access characters, etc.

On a side note:  It seems to me that the only reason to have char, wchar,
and dchar as separate types in the language is that arrays of said types
are UTF-encoded strings.  If a type such as the proposed one were to
become the default string type in D, it might as well wrap an array of
ubyte/ushort/uint, since direct user manipulation of the underlying array
will generally only happen in the rare cases when one wants to deal
directly with code units.

I'd still want a char[] array for easy manipulation and eventualprinting. Wrapping a ubyte[] with a string just to print would bestrange. My goal in this exercise is to try and give control of what todeal with (code-points vs. code-units) back to the user. Right now, thelibrary forces you to view them as code-points, and the compiler forcesyou to view them as code-units (except via foreach).


But it is an interesting idea (removing small char types).

-Steve

Re: [review] new string type

Reply via email to