On Sunday, 10 September 2017 at 11:47:02 UTC, Dmitry Olshansky wrote:
On Sunday, 10 September 2017 at 00:16:10 UTC, Chad Joan wrote:
I was working on std.regex a bit myself, so I created this bug report to capture some of the findings/progress:
https://issues.dlang.org/show_bug.cgi?id=17820

It seems like something you might be interested in, or might even have a small chance of fixing in the course of other things.

Yeah, well known problem. Solution is to keep a bit of memory cached eg in TLS variable.


Indeed.

Is there another issue I can mark it as a duplicate of?


[...]
-- The Captures struct does not specify what value is returned for submatches that were in the branch of an alternation that wasn't taken or in a repetition that matched 0 or more than 1 times.

As every engine out there the value is "", empty string.


I usually don't refer to other libraries while using a library. If an API doesn't define something, then it is, by definition, undefined behavior, and thus quite undesirable to rely upon.

This one seems pretty easy to fix though. I will probably make a documentation PR at some point.


-- The Captures struct does not seem to have a way to access all of the strings matched by a submatch in repetition context, not to mention nested repetition contexts.


Just like any other regex library.


Counterexample: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.captures(v=vs.110).aspx#code-snippet-3

I actually have a strong interest in this. And not because I need to write regular expressions that extract lists of patterns all the time (well, it might've happened). More importantly: this would make it easier to integrate Phobos' regex engine into a parser generator framework. Current plans involve regular expression + parsing expression grammars. I'm pretty sure it is possible to mechanically convert a subset of PEGs into Regexes and gain some useful optimizations, but this requires granular control over regular expression captures to be able to extract the text matched by the original PEG symbols.


I'm not sure how much those mentions help without proper bug reports, but at least I got it off my chest (for the time being) without having to spend my whole weekend writing bug reports ;)


Well they are warmly welcome shouldypu get to it.


Thanks!

...

Dmitry, I appreciate your working towards making the regex module easier to work on. Thanks.

...

I'm curious what you're thinking about when you mention something ambitious like writing a new GC :)
(like this https://imgur.com/cWa4evD)

I can't help but fantasize about cheap ways to get GC allocations to parallelize well and not end up writing an entire generational collector!

ThreadCache can go a long way to help that.


Google didn't help me with this one. Any chance I could get a link?

But I doubt I'll ever have the opportunity to work on such things. I hope your GC attempt works out!

Me too. It's won't be trivial effort though.

Good luck!

Reply via email to