parsing range queries: should they be analyzed?

Doug Cutting Mon, 21 Jan 2002 09:44:09 -0800

(I have some time to work on Lucene this week, so I'm going through old
changes that I made and never had the time to commit or discuss.)


It seems to me that the terms in a range query should not be analyzed.  If
one wishes to, e.g., look for terms starting with "intens", one does not
want this to be stemmed to "inten" and have "intend" match.

The diff for this change is below.

Do folks agree, or are there reasons that range query terms must be
analyzed?

Doug



Index: QueryParser.jj
===================================================================
RCS file:
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/queryParser/QueryParser.
jj,v
retrieving revision 1.9
diff -u -w -u -w -r1.9 QueryParser.jj
--- QueryParser.jj      17 Jan 2002 02:49:22 -0000      1.9
+++ QueryParser.jj      21 Jan 2002 17:34:47 -0000
@@ -204,39 +204,6 @@
     }
   }
 
-  private Query getRangeQuery(String field, 
-                              Analyzer analyzer, 
-                              String queryText, 
-                              boolean inclusive) 
-  {
-    // Use the analyzer to get all the tokens.  There should be 1 or 2.
-    TokenStream source = analyzer.tokenStream(field, 
-                                              new StringReader(queryText));
-    Term[] terms = new Term[2];
-    org.apache.lucene.analysis.Token t;
-
-    for (int i = 0; i < 2; i++)
-    {
-      try 
-      {
-        t = source.next();
-      } 
-      catch (IOException e) 
-      {
-        t = null;
-      }
-      if (t != null)
-      {
-        String text = t.termText();
-        if (!text.equalsIgnoreCase("NULL"))
-        {
-          terms[i] = new Term(field, text);
-        }
-      }
-    }
-    return new RangeQuery(terms[0], terms[1], inclusive);
-  }
-
   public static void main(String[] args) throws Exception {
     QueryParser qp = new QueryParser("field", 
                            new
org.apache.lucene.analysis.SimpleAnalyzer());
@@ -287,8 +254,10 @@
 | <PREFIXTERM:  <_TERM_START_CHAR> (<_TERM_CHAR>)* "*" >
 | <WILDTERM:  <_TERM_START_CHAR> 
               (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
-| <RANGEIN:   "[" ( ~[ "]" ] )+ "]">
-| <RANGEEX:   "{" ( ~[ "}" ] )+ "}">
+| <RANGE_IN_OPEN:   "[" >
+| <RANGE_IN_CLOSE:  "]" >
+| <RANGE_EX_OPEN:   "{" >
+| <RANGE_EX_CLOSE:  "}" >
 }
 
 <Boost> TOKEN : {
@@ -371,7 +340,7 @@
     
 
 Query Term(String field) : { 
-  Token term, boost=null;
+  Token term, boost=null, term2=null;
   boolean prefix = false;
   boolean wildcard = false;
   boolean fuzzy = false;
@@ -399,11 +368,20 @@
        else
          q = getFieldQuery(field, analyzer, term.image); 
      }
-     | ( term=<RANGEIN> { rangein=true; } | term=<RANGEEX> )
-       [ <CARAT> boost=<NUMBER> ]
+     | (
+        (<RANGE_IN_OPEN> { rangein=true; }
+         term=<TERM> 
+         (<MINUS> term2=<TERM>)
+         <RANGE_IN_CLOSE>)
+        |
+        (<RANGE_EX_OPEN>
+         term=<TERM> 
+         (<MINUS> term2=<TERM>)
+        <RANGE_EX_CLOSE> )
+       )
         {
-          q = getRangeQuery(field, analyzer, 
-                            term.image.substring(1, term.image.length()-1),

+       q = new RangeQuery(new Term(field, term.image),
+                          new Term(field, term2.image),
                             rangein);
         }
      | term=<QUOTED> 

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

parsing range queries: should they be analyzed?

Reply via email to