Re: [up for grabs]Two-way string searching

monarch_dodra Mon, 17 Jun 2013 04:41:20 -0700

On Sunday, 16 June 2013 at 23:09:31 UTC, Andrei Alexandrescuwrote:

On 6/16/13 6:16 PM, Dmitry Olshansky wrote:
It have long bugged me that Phobos std.algorithm.find is slow.Farslower then strstr (esp on *nix where it competes againstGLIBC[1]).
The current state of the art in searching substring
with O(1) space requirement is Two-Way algorithm:

http://www-igm.univ-mlv.fr/~lecroq/string/node26.html

which is linear in time.
I could imagine it would be interesting to implement a genericversionas well. Any takers? I'd gladly go for it myself butunfortunately waytoo busy on other fronts (I planed to do it for a couple ofmonths
already).

[1] See a relevant bug report and discussion e.g on glibc
http://sourceware.org/bugzilla/show_bug.cgi?id=5514
Awesome idea! Could you please submit an enhancement requestfor this? I wonder what kind of ranges that could work on.
Andrei

One of the "problems" is that find is array agnostic, so doesn'tknow how to squeeze out all the juice out arrays, such as:

* No bounds check on accesses
* No bounds check on slicing
* vectorized comparisons

I took the existing find(RA, BD) code, and modified it to operateon find(ARR, ARR).

On average, I'm getting roughly 20%-25% performance improvements(-release -O -inline), although the result is of course highlydependent on the tested input.

Goes to say that by addapting the existing algorithm to simplybetter exploit arrays, there is already room for goodimprovements.

Given that string-to-same-width-string boils back down integralarray search, the gains would also be had for strings.


--------

I was also able to squeeze out similar performace boosts forfind(R, E), with minimal code changes, exploiting betteriteration semantics based on the type iterated (range, RA array,or narrow string).


--------

I can start by improving find(R, E), because it is a small buteasy and effective change.

For find(R, R), things are a bit more dicey to properlyintegrate, so I don't want to do anything right now.

But the point is that there is still room for substantialimprovements without modifying the algorithm too much...

Re: [up for grabs]Two-way string searching

Reply via email to