Re: Going on std.regex & std.uni bug-fixing hunt

Dmitry Olshansky via Digitalmars-d Sun, 10 Sep 2017 13:52:02 -0700

On Sunday, 10 September 2017 at 18:54:21 UTC, Chad Joan wrote:

On Sunday, 10 September 2017 at 11:47:02 UTC, Dmitry Olshanskywrote:
Yeah, well known problem. Solution is to keep a bit of memorycached eg in TLS variable.
Indeed.

Is there another issue I can mark it as a duplicate of?

No it's just somthing that showed up in a number of benchmarkspeople posted, never got around to it. So just file it so I'llhave an issue to tick once I fix this.

[...]
-- The Captures struct does not specify what value isreturned for submatches that were in the branch of analternation that wasn't taken or in a repetition that matched0 or more than 1 times.
As every engine out there the value is "", empty string.
I usually don't refer to other libraries while using a library.
If an API doesn't define something, then it is, by definition,undefined behavior, and thus quite undesirable to rely upon.

In many way we just copy ECMAScript regex semantics, with someextensions from Python and Perl.

This one seems pretty easy to fix though. I will probably makea documentation PR at some point.


Please do.

-- The Captures struct does not seem to have a way to accessall of the strings matched by a submatch in repetitioncontext, not to mention nested repetition contexts.
Just like any other regex library.
Counterexample:https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.captures(v=vs.110).aspx#code-snippet-3


Horrible, horrible idea. Performance-wise that is.

I actually have a strong interest in this. And not because Ineed to write regular expressions that extract lists ofpatterns all the time (well, it might've happened). Moreimportantly: this would make it easier to integrate Phobos'regex engine into a parser generator framework.


No-no-no. Please don't.

Current plans involve regular expression + parsing expressiongrammars. I'm pretty sure it is possible to mechanicallyconvert a subset of PEGs into Regexes and gain some usefuloptimizations, but this requires granular control over regularexpression captures to be able to extract the text matched bythe original PEG symbols.

This is heavily misguided, but a common idea. PEGs are actuallyway simpler then regex. PEGs * and + have nothing to do withregex * and + qualifiers.

In PEG [ab]*b will never match because [ab]* eats any sequence ofa-s and b-s and never backtracks.

In regex [ab]* will match b, ab, bb, .... because regex'backtracks'.

Thus one can easily see that PEGs constructs are trivial toimplement, and provide quite a number of optimizations as well.

I'd argue that PEG can be made to run faster in 'parse' scenariowhereas nothing beats _simple_ regex in 'search' scenario.

ThreadCache can go a long way to help that.
Google didn't help me with this one. Any chance I could get alink?

The manpage for jemalloc mentions it explicitly search fortcache. Also look through jemalloc and related papers, I thinkthey all call it thread cache.


 https://linux.die.net/man/3/jemalloc

In essene you keep a small cache of free memory in TLS to servenext allocations faster. Some suggest a per-cpu cache, which isbit trickier but also interestingly avoid contention.

things.  I hope your GC attempt works out!


Me too. It's won't be trivial effort though.


Good luck!


Thanks!

Re: Going on std.regex & std.uni bug-fixing hunt

Reply via email to