18-Jun-2014 21:38, Timothee Cour via Digitalmars-d пишет:
I made a simple modification to std.regex to allow an option to prefix
match a regex.
Formally, if L(R) is the language recognized by a regex R, the language
recognized by prefix matching of R  is:

L(p(R)) = prefix(L(R)) = {u : uv in L(R) for some v}
Trying to come up (by hand or algorithmically) with a regex R' such that
L(R') L(p(R)) is awkward and inefficient, eg:

So essentially it's just a set of all partial matches?

R='hello';
R'=`|h|he|hell|hello` = `(h(e(l(l(o)?)?)?)?)?`;

However thinking in terms of state machine this is much easier and
efficient.

It looks like this:
assert("hel".match(`hello\d+`.regex("p")); //p for prefix match


Well, since "p" would apply to operation rather then regex I think adding a partialMatch function would be simpler.

Plus it may need to return extended information like where in the pattern it stopped, so the return type/API may need to be different from normal match.

If there's interest in adding this I can prepare a pull request and we
can discuss more.

Sure partial matching wold be nice to have and any help in this direction is warmly welcome.

Would be cool if you could take a look at
https://github.com/D-Programming-Language/phobos/pull/2216

It's nothing special except splitting up std.regex into a package but it's a stepping stone of going forward with it.

Example use case:
I wrote a function to search a file given a regex, and it is optimized
to prune directories early on if they fail to prefix match the regex, eg:

I'm not sure I follow - a normal regex match would do essentially the same by first checking the prefix and failing on wrong ones. Where is optimization?

What could be interesting is to limit matching to not go beyond say 80 code units and report if it's (partial) match of given pattern.

dirEntriesOptimized(`abc/folder_\d+/\w+\.cpp`)
when encountering `abc/bad_subfolder/` it will not recurse on this as it
fails the prefix regex match.



--
Dmitry Olshansky

Reply via email to