On Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <[email protected]>
wrote:
On 14/01/11 02:25, Steven Schveighoffer wrote:
> On Wed, 12 Jan 2011 04:49:26 -0500, Steven Wawryk
<[email protected]>
> wrote:
>
>>
>> I like the direction you're taking but have some quibbles about
>> details. Specifically, I'd go for a more complete separation into
>> random-access code-unit ranges and bidirectional code-point ranges:
>
> Thanks for taking the time. I will respond to your points, but please
> make your rebuttals to the new thread I'm about to create with an
> updated string type.
>
>> I don't see a need for _charStart, opIndex, opSlice and codeUnits. If
>> the underlying T[] can be returned by a property, then these can be
>> done through the code-unit array, which is random-access.
>
> But that puts extra pain on the user for not much reason. Currently,
> strings slice in one operation, you are proposing that we slice in
three
> operations:
>
> 1. get the underlying array
myString vs myString.data
> 2. slice it
Same for both.
> 3. reconstruct a string based on the slice.
myOtherString = find(myString, 'x');
vs
myOtherString = find(myString.data, 'x');
You may see extra pain. I see extra control. The user is making it
explicit at what level (code-unit/code-point/grapheme/whatever) of range
he/she wants the called algorithm to be working on.
Exactly, that is what my string type allows. You can either do it at the
code-point (and probably grapheme, discussion in progress) level, or you
can do it at the code-unit level. I don't see how restricting the user to
only doing it at the code-unit level is not more painful.
> Plus, if you remove opIndex, you are restricting the usefulness of the
> range. Note that this string type already will decode dchars out of
the
> front and back, why not just give that ability to the middle of the
string?
Because at the code-point level it *isn't* a random-access range and the
index makes no sense at the code-point level, only at the code-unit
level. It's encouraging the confusion of 2 distinctly different
abstractions or "views" of the same data. All the slicing and indexing
you're artificially putting in the code-point range is already available
in the code-unit range, and its only benefit is to allow the user to
save typing ".data".
I respectfully disagree. A stream built on fixed-sized units, but with
variable length elements, where you can determine the start of an element
in O(1) time given a random index absolutely provides random-access. It
just doesn't provide length.
You are also forgetting one thing, the main reason why a string type is
better than the array -- it's possible to slice a code-unit array using
indexes that create an invalid range. With my type it is not possible to
do that (it throws an exception). We want the basic user to use strings
properly (and inform them of their errors at the site of the error), and
if an advanced user wants more control, they can jump down to the
code-unit level by accessing the data property.
- other Steve
hehe, you can be Steve' :)
-Steve