On Wednesday, 5 October 2022 at 21:50:32 UTC, torhu wrote:
I did some basic testing, and regex was two orders of magnitude faster. So now I know, I guess.

Substring search functionality is currently in a very bad shape in Phobos. I discovered this myself a few weeks ago when I was trying to solve the https://www.facebook.com/codingcompetitions/hacker-cup/2022/round-1/problems/A2 puzzle using D language. A part of the solution requires a fast substring search (actually a subarray search) and Phobos doesn't offer anything with even remotely acceptable performance.

Phobos does have a Boyer-Moore implementation. This algorithm is historically famous in computer science as one of the first attempts to optimize substring search, but it also has pathologically bad performance on certain input data and probably shouldn't be recommended for any practical use nowadays. The users of old versions of Python discovered this too: https://codeforces.com/blog/entry/106849?#comment-952483

The standard 'find' function from the following simple example also becomes awfully slow when arrays 'a' and 'b' are large and/or are adversarially constructed:

```D
import std;
void main() {
  auto a = [1, 2, 3, 4];
  auto b = [2, 3];
  writefln("Is %s a subarray of %s? The answer is %s.",
           b, a, a.find(b).empty ? "no" : "yes");
}
```

I think that the best fit for your use case is the https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm and there's an old issue in bugzilla about this: https://issues.dlang.org/show_bug.cgi?id=16066

BTW, if anyone is curious, one of the possible solutions for the hacker-cup/2022/round-1/problems/A2 puzzle in D language can be found here: https://codeforces.com/blog/entry/106768?#comment-952808

Reply via email to