Re: [review] new string type

Steven Schveighoffer Thu, 02 Dec 2010 13:25:22 -0800

On Wed, 01 Dec 2010 21:13:35 -0500, Ellery Newcomer<[email protected]> wrote:

On 12/01/2010 03:35 PM, Steven Schveighoffer wrote:
On Tue, 30 Nov 2010 18:31:05 -0500, Ellery Newcomer
There definitely is value in being able to index and slice into utf
strings without resulting in invalid utf, but I think the fact that it
indexes on code unit and returns code point is sufficiently strange
that it qualifies as abuse of operator overloading.
Maybe :) The other alternative is to throw an exception if you try to
access a code unit that is not the beginning of a code point.

That might actually be less weird, I'll try doing that on the next
iteration.
in my mind, the problem isn't so much indexing an intermediate code unitgets you earlier code units (it's a little strange, and I'm not surewhether greater strictness would be better - on the one hand, lessstrictness would be more tolerant of bugs and make it that much moredifficult to detect them, but on the other hand if you were doingsomething like getting a random or approximate slice into your string,less strictness would mean that much less annoyance, though I have noidea why you would want to do that) as it is just the difference betweenthe two and the confusion that it's bound to cause the noobies.

Yes, it does seem odd, but then again, how often do you need theindividual characters of a string? I wrote php code for about 6 months asa full time job before I found I needed to access individual characters,and then I had to look up how to do it :) It's just not a common thing.

Typically, the index you use is calculated from something like find, andyou don't care what it is, as long as it's storable and persistent.

I find that iteration over string characters using index is a very rare
thing anyways, you either use foreach, which should give you dchars, or
you use something like find, which should never give you an invalidindex.
-Steve
find was the counterargument I had in mind for keeping the operatoroverload, as something like
s[find(s,'\u2729') .. s.codeUnits]

is just a bit better than

s.codePointSliceAt(find(s,'\u2729'), s.codeUnits);

I really don't know.


Ugh, yes, and actually, that reminds me I should define opDollar.

One thing that strikes me, though, if you're going to keep opIndex, isthat being able to do
foreach(size_t codeuniti, dchar c; s){

}
would be nice. Actually, it looks like you can do that with currentstrings.

At this point, you can't do that except via opApply, and I didn't want toinject that in fear that it would be pointed out as a drawback.


It would be nice if we could define a way to do that via ranges...

-Steve

Re: [review] new string type

Reply via email to