On Friday, 24 April 2015 at 18:28:16 UTC, Guillaume wrote:
Hello, I'm trying to make a regex comparison with D, based off
of this article: https://swtch.com/~rsc/regexp/regexp1.html
I've written my code like so:
import std.stdio, std.regex;
void main(string argv[]) {
string m = argv[1];
auto p =
ctRegex!("a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
if (match(m, p)) {
writeln("match");
} else {
writeln("no match");
}
}
And the compiler goes into swap. Doing it at runtime is no
better. I was under the impression that this particular regex
was used for showcasing the Thompson NFA which D claims to be
using.
A quick investigation shows that it gets stuck at the end of
pattern compilation stage.
The problem is that as a last pass D's regex goes to optimize the
pattern to construct simple bit-scanning engine as approximation
for prefix of original pattern. And that process is a lot like
Thompson NFA ... _BUT_ the trick of merging equivalent threads
wasn't applied there.
So in short: file a bug, optimizer absolutely should do
de-duplication of threads.
The golang code version of this runs fine, which makes me think
that maybe D isn't using the correct regex engine for this
particular regex. Or perhaps I'm using this wrong?
It uses 2 kinds of engines, run-time one is Thompson NFA.
Compile-time is (for now) still backtracking.
---
Dmitry Olshansky