On Sunday, 10 September 2017 at 11:47:02 UTC, Dmitry Olshansky
wrote:
On Sunday, 10 September 2017 at 00:16:10 UTC, Chad Joan wrote:
I was working on std.regex a bit myself, so I created this bug
report to capture some of the findings/progress:
https://issues.dlang.org/show_bug.cgi?id=17820
It seems like something you might be interested in, or might
even have a small chance of fixing in the course of other
things.
Yeah, well known problem. Solution is to keep a bit of memory
cached eg in TLS variable.
Indeed.
Is there another issue I can mark it as a duplicate of?
[...]
-- The Captures struct does not specify what value is returned
for submatches that were in the branch of an alternation that
wasn't taken or in a repetition that matched 0 or more than 1
times.
As every engine out there the value is "", empty string.
I usually don't refer to other libraries while using a library.
If an API doesn't define something, then it is, by definition,
undefined behavior, and thus quite undesirable to rely upon.
This one seems pretty easy to fix though. I will probably make a
documentation PR at some point.
-- The Captures struct does not seem to have a way to access
all of the strings matched by a submatch in repetition
context, not to mention nested repetition contexts.
Just like any other regex library.
Counterexample:
https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.captures(v=vs.110).aspx#code-snippet-3
I actually have a strong interest in this. And not because I
need to write regular expressions that extract lists of patterns
all the time (well, it might've happened). More importantly:
this would make it easier to integrate Phobos' regex engine into
a parser generator framework. Current plans involve regular
expression + parsing expression grammars. I'm pretty sure it is
possible to mechanically convert a subset of PEGs into Regexes
and gain some useful optimizations, but this requires granular
control over regular expression captures to be able to extract
the text matched by the original PEG symbols.
I'm not sure how much those mentions help without proper bug
reports, but at least I got it off my chest (for the time
being) without having to spend my whole weekend writing bug
reports ;)
Well they are warmly welcome shouldypu get to it.
Thanks!
...
Dmitry, I appreciate your working towards making the regex
module easier to work on. Thanks.
...
I'm curious what you're thinking about when you mention
something ambitious like writing a new GC :)
(like this https://imgur.com/cWa4evD)
I can't help but fantasize about cheap ways to get GC
allocations to parallelize well and not end up writing an
entire generational collector!
ThreadCache can go a long way to help that.
Google didn't help me with this one. Any chance I could get a
link?
But I doubt I'll ever have the opportunity to work on such
things. I hope your GC attempt works out!
Me too. It's won't be trivial effort though.
Good luck!