On Sunday, 10 September 2017 at 18:54:21 UTC, Chad Joan wrote:
On Sunday, 10 September 2017 at 11:47:02 UTC, Dmitry Olshansky wrote:

Yeah, well known problem. Solution is to keep a bit of memory cached eg in TLS variable.


Indeed.

Is there another issue I can mark it as a duplicate of?

No it's just somthing that showed up in a number of benchmarks people posted, never got around to it. So just file it so I'll have an issue to tick once I fix this.



[...]
-- The Captures struct does not specify what value is returned for submatches that were in the branch of an alternation that wasn't taken or in a repetition that matched 0 or more than 1 times.

As every engine out there the value is "", empty string.


I usually don't refer to other libraries while using a library.
If an API doesn't define something, then it is, by definition, undefined behavior, and thus quite undesirable to rely upon.

In many way we just copy ECMAScript regex semantics, with some extensions from Python and Perl.


This one seems pretty easy to fix though. I will probably make a documentation PR at some point.


Please do.

-- The Captures struct does not seem to have a way to access all of the strings matched by a submatch in repetition context, not to mention nested repetition contexts.


Just like any other regex library.


Counterexample: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.captures(v=vs.110).aspx#code-snippet-3


Horrible, horrible idea. Performance-wise that is.

I actually have a strong interest in this. And not because I need to write regular expressions that extract lists of patterns all the time (well, it might've happened). More importantly: this would make it easier to integrate Phobos' regex engine into a parser generator framework.

No-no-no. Please don't.

Current plans involve regular expression + parsing expression grammars. I'm pretty sure it is possible to mechanically convert a subset of PEGs into Regexes and gain some useful optimizations, but this requires granular control over regular expression captures to be able to extract the text matched by the original PEG symbols.

This is heavily misguided, but a common idea. PEGs are actually way simpler then regex. PEGs * and + have nothing to do with regex * and + qualifiers.

In PEG [ab]*b will never match because [ab]* eats any sequence of a-s and b-s and never backtracks.

In regex [ab]* will match b, ab, bb, .... because regex 'backtracks'.

Thus one can easily see that PEGs constructs are trivial to implement, and provide quite a number of optimizations as well.

I'd argue that PEG can be made to run faster in 'parse' scenario whereas nothing beats _simple_ regex in 'search' scenario.

ThreadCache can go a long way to help that.


Google didn't help me with this one. Any chance I could get a link?
The manpage for jemalloc mentions it explicitly search for tcache. Also look through jemalloc and related papers, I think they all call it thread cache.

 https://linux.die.net/man/3/jemalloc

In essene you keep a small cache of free memory in TLS to serve next allocations faster. Some suggest a per-cpu cache, which is bit trickier but also interestingly avoid contention.

things.  I hope your GC attempt works out!

Me too. It's won't be trivial effort though.

Good luck!

Thanks!


Reply via email to