Thanks Steve, this looks promising even if it doesn't perform the best. I'll run some tests on what produces the best results.

-Ron


On Jul 21, 2008, at 3:00 PM, Steven A Rowe wrote:

Hi Ronald,

Caveat - I haven't tested this, but:

With a RegexQuery <http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/regex/RegexQuery.html >, I think you can do something like (using your example):

  +abc*123 -{Regex}(?!abc.*123$)

This query would include all documents that have terms that match the wildcard "abc*123", and exclude all documents containing terms that don't match regex "^abc.*123$".

Note that the Lucene QueryParser doesn't handle regex queries (and if it did, the syntax would probably be different than "{Regex}" - this was intended solely for purposes of exposition). As a result, you would have to manually construct the RegexQuery and combine it using BooleanQuery clauses with your wildcard query.

The "(?!...)" syntax is a negative lookahead assertion - this is a Java 1.4+ java.util.regex.Pattern feature. Note that wildcard expressions are easily programmatically converted to regular expressions by substituting "*"->".*" and "?"->".", and then adding the "$" anchor. The "^" anchor is not required with RegexQuery's, because when using the java.util.regex engine (the default engine), j.u.r.Matcher.lookingAt() is used; from <http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html#lookingAt() >:

  Attempts to match the input sequence, starting at the
  beginning, against the pattern.

  Like the matches method, this method always starts at the
  beginning of the input sequence; unlike that method, it
  does not require that the entire input sequence be matched.

Caveat #2: RegexQuery's are relatively slow, since *all* index terms have to be tested against the regular expression, so you may have to use some other method if query response time turns out to be a problem.

Steve

On 07/20/2008 at 8:29 AM, Ronald Rudy wrote:
A query solution is preferable.. but I can programmatically
filter my results after the fact, it just seems like something that
the Lucene team should consider adding.. I think it would only have
value for wildcard queries, but nonetheless it would have some value
I think..

-Ron

On Jul 18, 2008, at 6:24 PM, eks dev wrote:

Analyzer that detects your condition "ALL match something", if
possible at all...
e.g. "800123456 80034543534 80023423423" -> 800

than you put it in ALL_MATCH field and match this condition against
it... if this prefix needs to be variable, you could extract all
matching prefixes to this fiield an make your query work like
"ALL_MATCH:800" and care not for the rest :) than yo would not need
field1 at all for these queries

you were looking for something like this or you need "Query solution"?

----- Original Message ----
From: Chris Hostetter <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Saturday, 19 July, 2008 12:00:39 AM
Subject: Re: Boolean expression for no terms OR matching a wildcard

Maybe this is easier ... suppose what I'm indexing is a phone number, and there are multiple phone numbers for what I'm indexing under the
same field (phone) and I want the wildcard query to match only
records that have either no phone numbers at all OR where ALL phone numbers are in a specific area code (e.g. 800* would match all in the
800 area code).

i can't think of anyway to accomplish the second part of your query.
specificly, given the following records...

Doc1: field1:AAA, field1:Aaa, field1:Bb, field1:C, field2:X, field3:Y
Doc2: field1:AAA, field1:Aaa, field1:Aa, field2:Z

...i can't think of any type of query like field1:A* which would match
Doc2 but not Doc1 (because there are other field1 values that do
not start with 'A')

-Hoss



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to