19-Jun-2014 11:36, Timothee Cour via Digitalmars-d пишет:
On Wed, Jun 18, 2014 at 12:37 PM, Dmitry Olshansky via Digitalmars-d
<[email protected] <mailto:[email protected]>> wrote:
18-Jun-2014 21:38, Timothee Cour via Digitalmars-d пишет:
I made a simple modification to std.regex to allow an option to
prefix
match a regex.
Formally, if L(R) is the language recognized by a regex R, the
language
recognized by prefix matching of R is:
L(p(R)) = prefix(L(R)) = {u : uv in L(R) for some v}
Trying to come up (by hand or algorithmically) with a regex R'
such that
L(R') L(p(R)) is awkward and inefficient, eg:
So essentially it's just a set of all partial matches?
Not sure, that depends on your definition of partial matches; how do you
formally define partial matches?
There's a subtlety, let's call my proposal prefixMatch:
"hel".prefixMatch("^hello") //true
"".prefixMatch("^hello") //true
"mel".prefixMatch("^hello") //false
But if the regex doesn't start with `^` then it is not useful:
"whatever".prefixMatch("hello") //true
because the scanner will try each position in the input string and fail,
except for the last one where it'll succeed trivially ("" is a prefix of
"hello").
Well, I think partialMatch is only defined in the current position.
Unlike normal match we can't provide default meaningful action e.g. if a
trivial "" result means search next or just report trivial partial match.
In other words, this is more useful for one shot matching.
Yes, it makes that much more sense for one shot matching IMO.
But we can also consider partial partial matches if we provide
additional constraints given via a lambda:
"blabla hel blabla".partialMatch!(a=>a.hit.length>=3) ("hello")
=> finds a match ("hel") because the lambda succeeds on this match.
partialMatch could also be used to find the longest partial match
(either in terms of length of matching string or length of the regex):
auto longestPartialMatch(R)(string input, R regex){
string longest;
while(true){
auto m=partialMatch!(a=>a.hit.length>longest.length) (input,regex);
if(m) longest=a.hit; else return longest;
}
Seems doable as helpers on top, but this kind of thing could be made
faster if built-in in std.regex like you suggest. I also can't see
anything better then lambda at the moment.
Would be cool if you could take a look at
https://github.com/D-__Programming-Language/phobos/__pull/2216
<https://github.com/D-Programming-Language/phobos/pull/2216>
It's nothing special except splitting up std.regex into a package
but it's a stepping stone of going forward with it.
That's a very welcome change, this module is too big currently. That
seems like it would be the 1st phobos module to take advantage of
package feature?
There is competition of std.container!
https://github.com/D-Programming-Language/phobos/pull/2174
Truth be told it's the most obvious candidate, as containers have
nothing to do with each other.
Example use case:
I wrote a function to search a file given a regex, and it is
optimized
to prune directories early on if they fail to prefix match the
regex, eg:
I'm not sure I follow - a normal regex match would do essentially
the same by first checking the prefix and failing on wrong ones.
Where is optimization?
The optimization is avoiding unnecessary file system access by pruning
early on directories that we can prove will not contain any file which
will match the regex. Without optimization, you have to consider (and
reject) `abc/bad_subfolder/file_1`. With optimization, you never have to
consider that file (nor ask the file system to list it), as
`abc/bad_subfolder/` fails the prefix regex test (see my example). A
normal regex needs to be applied to the whole path, and you need prefix
matching to reject partial paths (ie, directories).
Now I see - you don't have the full input while traversing directories,
cool idea.
What could be interesting is to limit matching to not go beyond say
80 code units and report if it's (partial) match of given pattern.
That'd be nice; it needs to be as generic as possible; I propose using a
user defined lambda (see above).
Well, why not but I'd suggest to focus on getting the partialMatch at
the current position as a good generic building block to extend from.
--
Dmitry Olshansky