On Thu, May 27, 2010 at 10:23 PM, Danny Sokolsky < [email protected]> wrote:
> Sorry this one slipped through the cracks for a while...better late than > never. NP! Thanks for looking at it. > We think this is not actually a bug, even though it appears so at first > glance. Hmm... > The reason is that the specification is vague and leaves certain > details to the implementation. It says that ungreedy/reluctant quantifiers > are required to match the shortest possible substring, but it does not give > rules for the priority of sub-expressions/capturing groups. This doesn't quite make sense. I read the specification, and looked at my example again. It has nothing to do with "priority of sub-expressions/capturing groups". It has only to do with the scope of the '?'. The spec says "Reluctant quantifiers are supported. They are indicated by a '?' following a quantifier. " I read this to indicate that the '?' should apply to the quantifier that precedes it. It seems very clear. If a quantifier isn't followed by the '?', then it should not be "reluctant". > So, it's up to > the implementation. If you want to read some gory details, check out > http://www.w3.org/TR/xpath-functions/#string.match. > POSIX does define such rules, but it doesn't have the notion of ungreedy > quantifiers. Perl doesn't try to define the rules; there is no such thing > as a Perl specification. The Perl implementation is the Perl specification. > The closest thing is a description of the > implementation, which is an inherently low-performance approach involving > trying one match at a time (i.e. backtracking). This is not a great > approach. Perl is open source, and also very well-performing. You could open it up and take a look (but granted, I haven't done so -- I don't know if it would be a can of worms or not.) > So the MarkLogic implementation chose the more performant approach. In my tests so far, MarkLogic is about two or three times slower than Perl on regular expressions. But I haven't polished the tests, and I might not be comparing "apples to apples" yet. I'll let you know. > In the 1.0-ml dialect, however, there is an undocumented āpā flag to the > functions that take a regex that does the perl-like matching (it is an > extension to the spec, so it is not available in the 1.0 dialect). Why isn't it documented? It seems like something that should be. > I think > your workaround is a better approach, however. Cheers!
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
