Erik Hatcher wrote:
On Aug 30, 2006, at 6:13 PM, Mark Miller wrote:
* An implementation tying Java's built-in java.util.regex to RegexQuery.
*
* Note that because this implementation currently only returns null from
* [EMAIL PROTECTED] #prefix} that queries using this implementation will
enumerate and
* attempt to [EMAIL PROTECTED] #match} each term for the specified field in the
index.
Is this another way to say im gonna be friggen slow? Say it aint so...
"slow" is relative. It will enumerate all the terms for the specified
field and run a regular expression match on each one. The same thing
happens with FuzzyQuery and prefixed WildcardQuery too. These aren't
necessarily "slow", so try it and see.
I want to use this as a multi-phrase query...a spannear with a term
that could be the regex "term1|term2"
What about nesting a SpanOrQuery for those two terms inside a
SpanNearQuery?
I need this. Pipe dream for speed on a huge index?
Feel free to implement a robust prefix method :) It's much more
difficult than I wanted to tackle when I created this infrastructure.
But thankfully Regexp implemented it, so you could use it for prefix
computation and a different matcher implementation if you like.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thanks for the info Erik. I did not realize that WildcardQuery and
FuzzyQuery did this as well. A lot of my concern was that I needed to
implement WildcardQuery as a SpanRegexQuery so that I could get nested
wildcard searches in my proximity searches. If it's the same speed as
WildcardQuery I am not worried. However, it seems like it could be even
faster:
I only need to support * and ? as wildcard does. I don't want to include
Jakarta regex with my distro. I made a new Regex implementation based on
the Java 5 util stuff that only allows * and ?.
I pass the pattern string into a short method that:
* Removes single backslashes, halves double backslashes, escapes
* non-alphanumeric, and records prefix. Ignores * and ?.
Then I replace * with .* and ? with *{1}.
Only supporting * and ? seems to make grabbing the prefix nice and simple.
Now my question: should I use this instead of wildcardquery even when
not in a span search? Sounds like it would be more efficient.
A
lso, how does a spanOr query work? Is the resulting span anchored at the
start of the word and the length of the word? Like a term span? So that
its an Or Term span? If there are more than one matches does the span
cover all of them or is each match a span the size of each hit?
Thanks,
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]