Re: [review] new string type (take 2)

Steven Schveighoffer Fri, 14 Jan 2011 05:10:52 -0800

On Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <[email protected]>wrote:

On 14/01/11 02:25, Steven Schveighoffer wrote:

> On Wed, 12 Jan 2011 04:49:26 -0500, Steven Wawryk<[email protected]>

 > wrote:
 >
 >>
 >> I like the direction you're taking but have some quibbles about
 >> details. Specifically, I'd go for a more complete separation into
 >> random-access code-unit ranges and bidirectional code-point ranges:
 >
 > Thanks for taking the time. I will respond to your points, but please
 > make your rebuttals to the new thread I'm about to create with an
 > updated string type.
 >
 >> I don't see a need for _charStart, opIndex, opSlice and codeUnits. If
 >> the underlying T[] can be returned by a property, then these can be
 >> done through the code-unit array, which is random-access.
 >
 > But that puts extra pain on the user for not much reason. Currently,

> strings slice in one operation, you are proposing that we slice inthree

 > operations:
 >
 > 1. get the underlying array


myString vs myString.data

 > 2. slice it

Same for both.

 > 3. reconstruct a string based on the slice.

myOtherString = find(myString, 'x');
vs
myOtherString = find(myString.data, 'x');

You may see extra pain. I see extra control. The user is making itexplicit at what level (code-unit/code-point/grapheme/whatever) of rangehe/she wants the called algorithm to be working on.

Exactly, that is what my string type allows. You can either do it at thecode-point (and probably grapheme, discussion in progress) level, or youcan do it at the code-unit level. I don't see how restricting the user toonly doing it at the code-unit level is not more painful.

 > Plus, if you remove opIndex, you are restricting the usefulness of the
> range. Note that this string type already will decode dchars out ofthe> front and back, why not just give that ability to the middle of thestring?
Because at the code-point level it *isn't* a random-access range and theindex makes no sense at the code-point level, only at the code-unitlevel. It's encouraging the confusion of 2 distinctly differentabstractions or "views" of the same data. All the slicing and indexingyou're artificially putting in the code-point range is already availablein the code-unit range, and its only benefit is to allow the user tosave typing ".data".

I respectfully disagree. A stream built on fixed-sized units, but withvariable length elements, where you can determine the start of an elementin O(1) time given a random index absolutely provides random-access. Itjust doesn't provide length.

You are also forgetting one thing, the main reason why a string type isbetter than the array -- it's possible to slice a code-unit array usingindexes that create an invalid range. With my type it is not possible todo that (it throws an exception). We want the basic user to use stringsproperly (and inform them of their errors at the site of the error), andif an advanced user wants more control, they can jump down to thecode-unit level by accessing the data property.

- other Steve


hehe, you can be Steve' :)

-Steve

Re: [review] new string type (take 2)

Reply via email to