Hi Ronald,

Caveat - I haven't tested this, but:

With a RegexQuery 
<http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/regex/RegexQuery.html>,
 I think you can do something like (using your example):

   +abc*123 -{Regex}(?!abc.*123$)

This query would include all documents that have terms that match the wildcard 
"abc*123", and exclude all documents containing terms that don't match regex 
"^abc.*123$".

Note that the Lucene QueryParser doesn't handle regex queries (and if it did, 
the syntax would probably be different than "{Regex}" - this was intended 
solely for purposes of exposition).  As a result, you would have to manually 
construct the RegexQuery and combine it using BooleanQuery clauses with your 
wildcard query.

The "(?!...)" syntax is a negative lookahead assertion - this is a Java 1.4+ 
java.util.regex.Pattern feature.  Note that wildcard expressions are easily 
programmatically converted to regular expressions by substituting "*"->".*" and 
"?"->".", and then adding the "$" anchor.  The "^" anchor is not required with 
RegexQuery's, because when using the java.util.regex engine (the default 
engine), j.u.r.Matcher.lookingAt() is used; from 
<http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html#lookingAt()>:

   Attempts to match the input sequence, starting at the
   beginning, against the pattern.

   Like the matches method, this method always starts at the
   beginning of the input sequence; unlike that method, it
   does not require that the entire input sequence be matched. 

Caveat #2: RegexQuery's are relatively slow, since *all* index terms have to be 
tested against the regular expression, so you may have to use some other method 
if query response time turns out to be a problem.

Steve

On 07/20/2008 at 8:29 AM, Ronald Rudy wrote:
> A query solution is preferable.. but I can programmatically
> filter my results after the fact, it just seems like something that
> the Lucene team should consider adding.. I think it would only have
> value for wildcard queries, but nonetheless it would have some value
> I think..
> 
> -Ron
> 
> On Jul 18, 2008, at 6:24 PM, eks dev wrote:
> 
> > Analyzer that detects your condition "ALL match something", if
> > possible at all...
> > e.g. "800123456 80034543534 80023423423" -> 800
> > 
> > than you put it in ALL_MATCH field and match this condition against
> > it... if this prefix needs to be variable, you could extract all
> > matching prefixes to this fiield an make your query work like
> > "ALL_MATCH:800" and care not for the rest :) than yo would not need
> > field1 at all for these queries
> > 
> > you were looking for something like this or you need "Query solution"?
> > 
> > ----- Original Message ----
> > > From: Chris Hostetter <[EMAIL PROTECTED]>
> > > To: java-user@lucene.apache.org
> > > Sent: Saturday, 19 July, 2008 12:00:39 AM
> > > Subject: Re: Boolean expression for no terms OR matching a wildcard
> > > 
> > > > Maybe this is easier ... suppose what I'm indexing is a phone number,
> > > > and there are multiple phone numbers for what I'm indexing under the
> > > > same field (phone) and I want the wildcard query to match only
> > > > records that have either no phone numbers at all OR where ALL phone
> > > > numbers are in a specific area code (e.g. 800* would match all in the
> > > > 800 area code).
> > > 
> > > i can't think of anyway to accomplish the second part of your query.
> > > specificly, given the following records...
> > > 
> > >  Doc1: field1:AAA, field1:Aaa, field1:Bb, field1:C, field2:X, field3:Y
> > >  Doc2: field1:AAA, field1:Aaa, field1:Aa, field2:Z
> > > 
> > > ...i can't think of any type of query like field1:A* which would match
> > > Doc2 but not Doc1 (because there are other field1 values that do
> > > not start with 'A')
> > >
> > > -Hoss



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to