Re: Major performance problem with std.array.front()

Andrei Alexandrescu Sat, 08 Mar 2014 12:11:21 -0800

On 3/8/14, 9:33 AM, Sean Kelly wrote:

On Saturday, 8 March 2014 at 00:22:05 UTC, Walter Bright wrote:

Andrei suggests that this change would destroy D by breaking too much
existing code. He might be right. Can we afford the risk that he is
right?


Perhaps not.  But I think the current approach is totally broken, it's
just also happens to be what people have coded to.

I think that's an exaggeration poorly supported by evidence. Mydefinition of "totally broken" would be "essentially unusable" and Ithink we're well past the point we need to prove that. Virtually allapplications need to deal with strings to some extent, and I myselfwrote a couple of relatively string-heavy ones. D strings work well.Even the most ardent detractors of D on e.g. reddit.com admit byomission that string processing is one if its strengths. Though they'llprobably pick up on this thread soon :o).

Andrei used
algorithms operating on a code point level as an example of what would
break if this change were made, and in that he's absolutely correct.
But what bothers me is whether it's appropriate to perform this sort of
character-based operation on Unicode strings in the first place.

Searching for characters in strings would be difficult to deeminappropriate.

When I designed std.algorithm I recall I put the following options onthe table:

1. All algorithms would by default operate on strings at char/wcharlevel (i.e. code unit). That would cause the usual issues and confusionsI was aware of from C++. Certain algorithms would require specializationand/or the user using byDchar for correctness. At some point I swearI've had a byDchar definition somewhere; I've searched the recent githistory for it, no avail.

2. All algorithms would by default operate at code point level. That waycorrectness would be achieved by default, and certain algorithms wouldrequire specialization for efficiency. (Back then I didn't know aboutgraphemes and normalization. I'm not sure how that would have affectedthe final decision.)

3. Change the alias string, wstring etc. to be some type that requiresexplicit access for code units/code points etc. instead of implicitlymixing the two.

My fave was (3). And not mine only - several people suggestedalternative definitions of the "default" string type. Back then howeverwe were in the middle of the D1/D2 transition and one more aftershockdidn't seem like a good idea at all. Walter opposed such a change, anddidn't really have to convince me.

From experience with C++ I knew (1) had a bad track record, and (2)"generically conservative, specialize for speed" was a successful pattern.


What would you have chosen given that context?

The current approach is a cut above treating strings as arrays of bytes
for some languages, and still utterly broken for others. If I'm
operating on a right to left language like Hebrew, what would I expect
the result to be from something like countUntil?

The entire string processing paraphernalia is left to right. I figureRTL languages are under-supported, but s.retro.countUntil comes to mind.

And how useful would
such a result be?


I don't know.

I'm inclined to say that the correct approach is to
state that algorithms operate explicitly on a T.sizeof basis and that if
the data contained in a particular range has some multi-element encoding
then separate, specialized routines should be used with the T.sizeof
behavior will not produce the desired result.

That sounds quite like C++ plus ICU. It doesn't strike me as the goldenstandard for Unicode integration.

So the problem to me is that we're stuck not fixing something that's
horribly broken just because it's broken in a way that people presumably
now expect.

Clearly I'm being subjective here but again I'd find it difficult to getconvinced we have something horribly broken from the evidence I gatheredinside and outside Facebook.

I'd personally like to see this fixed and I think the new behavior is
preferable overall, but I do share Andrei's concern that such a big
change might hurt the language anyway.

I've said this once and I'm saying it again: the best way to convertthis discussion into something useful is to devise ideas for usefulnon-breaking additions.



Andrei

Re: Major performance problem with std.array.front()

Reply via email to