Thanks Steve, this looks promising even if it doesn't perform the
best. I'll run some tests on what produces the best results.
-Ron
On Jul 21, 2008, at 3:00 PM, Steven A Rowe wrote:
Hi Ronald,
Caveat - I haven't tested this, but:
With a RegexQuery <http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/regex/RegexQuery.html
>, I think you can do something like (using your example):
+abc*123 -{Regex}(?!abc.*123$)
This query would include all documents that have terms that match
the wildcard "abc*123", and exclude all documents containing terms
that don't match regex "^abc.*123$".
Note that the Lucene QueryParser doesn't handle regex queries (and
if it did, the syntax would probably be different than "{Regex}" -
this was intended solely for purposes of exposition). As a result,
you would have to manually construct the RegexQuery and combine it
using BooleanQuery clauses with your wildcard query.
The "(?!...)" syntax is a negative lookahead assertion - this is a
Java 1.4+ java.util.regex.Pattern feature. Note that wildcard
expressions are easily programmatically converted to regular
expressions by substituting "*"->".*" and "?"->".", and then adding
the "$" anchor. The "^" anchor is not required with RegexQuery's,
because when using the java.util.regex engine (the default engine),
j.u.r.Matcher.lookingAt() is used; from <http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html#lookingAt()
>:
Attempts to match the input sequence, starting at the
beginning, against the pattern.
Like the matches method, this method always starts at the
beginning of the input sequence; unlike that method, it
does not require that the entire input sequence be matched.
Caveat #2: RegexQuery's are relatively slow, since *all* index terms
have to be tested against the regular expression, so you may have to
use some other method if query response time turns out to be a
problem.
Steve
On 07/20/2008 at 8:29 AM, Ronald Rudy wrote:
A query solution is preferable.. but I can programmatically
filter my results after the fact, it just seems like something that
the Lucene team should consider adding.. I think it would only have
value for wildcard queries, but nonetheless it would have some value
I think..
-Ron
On Jul 18, 2008, at 6:24 PM, eks dev wrote:
Analyzer that detects your condition "ALL match something", if
possible at all...
e.g. "800123456 80034543534 80023423423" -> 800
than you put it in ALL_MATCH field and match this condition against
it... if this prefix needs to be variable, you could extract all
matching prefixes to this fiield an make your query work like
"ALL_MATCH:800" and care not for the rest :) than yo would not need
field1 at all for these queries
you were looking for something like this or you need "Query
solution"?
----- Original Message ----
From: Chris Hostetter <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Saturday, 19 July, 2008 12:00:39 AM
Subject: Re: Boolean expression for no terms OR matching a wildcard
Maybe this is easier ... suppose what I'm indexing is a phone
number,
and there are multiple phone numbers for what I'm indexing under
the
same field (phone) and I want the wildcard query to match only
records that have either no phone numbers at all OR where ALL
phone
numbers are in a specific area code (e.g. 800* would match all
in the
800 area code).
i can't think of anyway to accomplish the second part of your
query.
specificly, given the following records...
Doc1: field1:AAA, field1:Aaa, field1:Bb, field1:C, field2:X,
field3:Y
Doc2: field1:AAA, field1:Aaa, field1:Aa, field2:Z
...i can't think of any type of query like field1:A* which would
match
Doc2 but not Doc1 (because there are other field1 values that do
not start with 'A')
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]