On Wednesday, 5 October 2022 at 21:50:32 UTC, torhu wrote:
I did some basic testing, and regex was two orders of magnitude
faster. So now I know, I guess.
Substring search functionality is currently in a very bad shape
in Phobos. I discovered this myself a few weeks ago when I was
trying to solve the
https://www.facebook.com/codingcompetitions/hacker-cup/2022/round-1/problems/A2 puzzle using D language. A part of the solution requires a fast substring search (actually a subarray search) and Phobos doesn't offer anything with even remotely acceptable performance.
Phobos does have a Boyer-Moore implementation. This algorithm is
historically famous in computer science as one of the first
attempts to optimize substring search, but it also has
pathologically bad performance on certain input data and probably
shouldn't be recommended for any practical use nowadays. The
users of old versions of Python discovered this too:
https://codeforces.com/blog/entry/106849?#comment-952483
The standard 'find' function from the following simple example
also becomes awfully slow when arrays 'a' and 'b' are large
and/or are adversarially constructed:
```D
import std;
void main() {
auto a = [1, 2, 3, 4];
auto b = [2, 3];
writefln("Is %s a subarray of %s? The answer is %s.",
b, a, a.find(b).empty ? "no" : "yes");
}
```
I think that the best fit for your use case is the
https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm and there's an old issue in bugzilla about this: https://issues.dlang.org/show_bug.cgi?id=16066
BTW, if anyone is curious, one of the possible solutions for the
hacker-cup/2022/round-1/problems/A2 puzzle in D language can be
found here:
https://codeforces.com/blog/entry/106768?#comment-952808