Following up on the (Span)RegexQuery topic, I've started working on moving this code to contrib/regex so that it can leverage various regex implementations. I'm making a generic interface that currently (though subject to change) has these methods:

  void compile(String pattern);
  boolean match(String string);
  int prefixLength();

I'm going to initially create an implementation for both Jakarta Regexp and java.util.regex, and probably Jakarta ORO also. I've been able to extract the prefix length using Jakarta Regexp, but I don't believe this is possible with java.util.regex. I haven't looked into Jakarta ORO deep enough yet to see if it makes this available.

(Span)RegexQuery will have a setter for specifying which implementation to use, probably with the default for java.util.regex to allow running without any dependencies.

An interesting thing to note...

        Jakarta Regex: "a.c" matches "abcd"
java.util.regex: "a.c" does not match "abcd" using Matcher.matches (), but it does match using Matcher.lookingAt()

In other words, if you want "a.*" to only match terms that begin with "a", the regex logically must be specified as "^a.*". This is of no real concern to the regex query really, but the underlying matching implementation. And for query parsing, it would likely be desirable to wrap all regex expressions with ^...$ (which is generally what users would mean when saying "a.*").

I'm also considering having the implementation independent interface specify a method to rotate an expression, though this is a more advanced feature that perhaps belongs at a different layer.

I'm open to suggestions on all of this, with my main goal to provide a general purpose regular expression query that can be as fast as possible by minimizing term enumeration.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to