https://issues.dlang.org/show_bug.cgi?id=12690
--- Comment #2 from Diez <[email protected]> --- Created attachment 1351 --> https://issues.dlang.org/attachment.cgi?id=1351&action=edit Source code for more complex test case (UTF-8) Thank you for your help. I verified the compilation options of my real-world program and in fact the -inline option was missing (-O was already set). However, it didn't change anything at the overall performance, it's still 5 seconds (backtracking) versus 47 seconds (Thompson). With this magnitude in performance difference there's no chance to switch to the Thompson matcher. The attached source is almost the same as before, but with average expression and haystack from my real-world program (confidential informations scrambled). Now it's not a small performance difference, but a factor of more than 5. As I understand the meaning of the word "discouraged", the backtracking matcher is to be removed in some future. Please don't. As long as the Thompson matcher has this poor performance (for certain use cases), there's need for both matchers. In fact, using the Thompson matcher is way slower than an equivalent Java 1.7 program, while using the backtracking matcher is slightly faster. Of course, optimally there's only one matcher which combines the advantages of all underlaying matchers. In my real-world program, there's about 100 times more haystacks than patterns (940 vs. 112_000). As soon as a pattern matches, the next haystack is examined. Each haystack needed about 23 tries (avg) until a pattern matched, actually there were 2_584_925 match calls in total of which 111_675 matched the haystack (a "few" haystacks didn't match any pattern). Note that the attached test case source file needs to be saved in UTF-8 encoding, since the regular expression string contains UTF-8 characters (hopefully, they don't get destroyed during upload into the issue tracking system). The UTF-8 signature comment line should help Windows editors to automatically switch to UTF-8 encoding when opening the source file (like an UTF-8 BOM). The idea of matching multiple patterns with only one call of the matcher is very interesting (though I didn't test it yet). Consider a 5 times faster matcher with 5 matches at once ... --
