On Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <[email protected]> wrote:

On 14/01/11 02:25, Steven Schveighoffer wrote:
> On Wed, 12 Jan 2011 04:49:26 -0500, Steven Wawryk <[email protected]>
 > wrote:
 >
 >>
 >> I like the direction you're taking but have some quibbles about
 >> details. Specifically, I'd go for a more complete separation into
 >> random-access code-unit ranges and bidirectional code-point ranges:
 >
 > Thanks for taking the time. I will respond to your points, but please
 > make your rebuttals to the new thread I'm about to create with an
 > updated string type.
 >
 >> I don't see a need for _charStart, opIndex, opSlice and codeUnits. If
 >> the underlying T[] can be returned by a property, then these can be
 >> done through the code-unit array, which is random-access.
 >
 > But that puts extra pain on the user for not much reason. Currently,
> strings slice in one operation, you are proposing that we slice in three
 > operations:
 >
 > 1. get the underlying array

myString vs myString.data

 > 2. slice it

Same for both.

 > 3. reconstruct a string based on the slice.

myOtherString = find(myString, 'x');
vs
myOtherString = find(myString.data, 'x');

You may see extra pain. I see extra control. The user is making it explicit at what level (code-unit/code-point/grapheme/whatever) of range he/she wants the called algorithm to be working on.

Exactly, that is what my string type allows. You can either do it at the code-point (and probably grapheme, discussion in progress) level, or you can do it at the code-unit level. I don't see how restricting the user to only doing it at the code-unit level is not more painful.

 > Plus, if you remove opIndex, you are restricting the usefulness of the
> range. Note that this string type already will decode dchars out of the > front and back, why not just give that ability to the middle of the string?

Because at the code-point level it *isn't* a random-access range and the index makes no sense at the code-point level, only at the code-unit level. It's encouraging the confusion of 2 distinctly different abstractions or "views" of the same data. All the slicing and indexing you're artificially putting in the code-point range is already available in the code-unit range, and its only benefit is to allow the user to save typing ".data".

I respectfully disagree. A stream built on fixed-sized units, but with variable length elements, where you can determine the start of an element in O(1) time given a random index absolutely provides random-access. It just doesn't provide length.

You are also forgetting one thing, the main reason why a string type is better than the array -- it's possible to slice a code-unit array using indexes that create an invalid range. With my type it is not possible to do that (it throws an exception). We want the basic user to use strings properly (and inform them of their errors at the site of the error), and if an advanced user wants more control, they can jump down to the code-unit level by accessing the data property.

- other Steve

hehe, you can be Steve' :)

-Steve

Reply via email to