Re: Major performance problem with std.array.front()

Nick Sabalausky Sun, 09 Mar 2014 03:06:07 -0700

On 3/7/2014 6:33 PM, H. S. Teoh wrote:

On Fri, Mar 07, 2014 at 11:13:50PM +0000, Sarath Kodali wrote:

On Friday, 7 March 2014 at 22:35:47 UTC, Sarath Kodali wrote:


+1
In Indian languages, a character consists of one or more UNICODE
code points. For example, in Sanskrit "ddhrya"
http://en.wikipedia.org/wiki/File:JanaSanskritSans_ddhrya.svg
consists of 7 UNICODE code points. So to search for this char I
have to use string search.

- Sarath


Oops, incomplete reply ...

Since a single "alphabet" in Indian languages can contain multiple
code-points, iterating over single code-points is like iterating
over char[] for non English European languages. So decode is of no
use other than decreasing the performance. A raw char[] comparison
is much faster.


Yes. The more I think about it, the more auto-decoding sounds like a
wrong decision. The question, though, is whether it's worth the massive
code breakage needed to undo it. :-(

I'm leaning the same way too. But I also think Andrei is right that, atthis point in time, it'd be a terrible move to change things so that "bycode unit" is default. For better or worse, that ship has sailed.

Perhaps we *can* deal with the auto-decoding problem not by killingauto-decoding, but by marginalizing it in an additive way:

Convincing arguments have been made that any string-processing codewhich *isn't* done entirely with the official Unicode algorithms islikely wrong *regardless* of whether std.algorithm defaults toper-code-unit or per-code-point.

So...How's this?: We add any of these Unicode algorithms we may bemissing, encourage their use for strings, discourage use ofstd.algorithm for string processing, and in the meantime, just do ourbest to reduce unnecessary decoding wherever possible. Then we call it aday and all be happy :)

Re: Major performance problem with std.array.front()

Reply via email to