On Sunday, 10 September 2017 at 18:54:21 UTC, Chad Joan wrote:
On Sunday, 10 September 2017 at 11:47:02 UTC, Dmitry Olshansky
wrote:
Yeah, well known problem. Solution is to keep a bit of memory
cached eg in TLS variable.
Indeed.
Is there another issue I can mark it as a duplicate of?
No it's just somthing that showed up in a number of benchmarks
people posted, never got around to it. So just file it so I'll
have an issue to tick once I fix this.
[...]
-- The Captures struct does not specify what value is
returned for submatches that were in the branch of an
alternation that wasn't taken or in a repetition that matched
0 or more than 1 times.
As every engine out there the value is "", empty string.
I usually don't refer to other libraries while using a library.
If an API doesn't define something, then it is, by definition,
undefined behavior, and thus quite undesirable to rely upon.
In many way we just copy ECMAScript regex semantics, with some
extensions from Python and Perl.
This one seems pretty easy to fix though. I will probably make
a documentation PR at some point.
Please do.
-- The Captures struct does not seem to have a way to access
all of the strings matched by a submatch in repetition
context, not to mention nested repetition contexts.
Just like any other regex library.
Counterexample:
https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.captures(v=vs.110).aspx#code-snippet-3
Horrible, horrible idea. Performance-wise that is.
I actually have a strong interest in this. And not because I
need to write regular expressions that extract lists of
patterns all the time (well, it might've happened). More
importantly: this would make it easier to integrate Phobos'
regex engine into a parser generator framework.
No-no-no. Please don't.
Current plans involve regular expression + parsing expression
grammars. I'm pretty sure it is possible to mechanically
convert a subset of PEGs into Regexes and gain some useful
optimizations, but this requires granular control over regular
expression captures to be able to extract the text matched by
the original PEG symbols.
This is heavily misguided, but a common idea. PEGs are actually
way simpler then regex. PEGs * and + have nothing to do with
regex * and + qualifiers.
In PEG [ab]*b will never match because [ab]* eats any sequence of
a-s and b-s and never backtracks.
In regex [ab]* will match b, ab, bb, .... because regex
'backtracks'.
Thus one can easily see that PEGs constructs are trivial to
implement, and provide quite a number of optimizations as well.
I'd argue that PEG can be made to run faster in 'parse' scenario
whereas nothing beats _simple_ regex in 'search' scenario.
ThreadCache can go a long way to help that.
Google didn't help me with this one. Any chance I could get a
link?
The manpage for jemalloc mentions it explicitly search for
tcache. Also look through jemalloc and related papers, I think
they all call it thread cache.
https://linux.die.net/man/3/jemalloc
In essene you keep a small cache of free memory in TLS to serve
next allocations faster. Some suggest a per-cpu cache, which is
bit trickier but also interestingly avoid contention.
things. I hope your GC attempt works out!
Me too. It's won't be trivial effort though.
Good luck!
Thanks!