Re: default AND operator

Mark Miller Sun, 17 Sep 2006 08:34:41 -0700

3 docs with one field each in index:
-------------------------------------
french beast stone
crazy rolling stone
rolling stone done in by coconut


3 searches, default op set as AND
-------------------------------------
search("coconut stone");
search("coconut OR stone");
search("coconut AND stone");

3 results:
--------------------------------------
query: +allFields:coconut +allFields:stone
Found 1 document(s) (in 31 milliseconds) that matched query 'coconut stone':

query: allFields:coconut allFields:stone

Found 3 document(s) (in 0 milliseconds) that matched query 'coconut ORstone':


query: +allFields:coconut +allFields:stone

Found 1 document(s) (in 16 milliseconds) that matched query 'coconut ANDstone':

You do not find this to be true? Your analyzer should not be a problemas the Queryparser will only analyze non queryparser syntax keywords.


Code follows:

public class Tester {
   private static RAMDirectory directory;

   private static Analyzer analyzer;

   public static void main(String[] args) {
       setupIndex();

try {

           search("coconut stone");
           search("coconut OR stone");
           search("coconut AND stone");
       } catch (Exception e) {
           // TODO Auto-generated catch block
           e.printStackTrace();
       }
   }

   private static void setupIndex() {
       directory = new RAMDirectory();

       analyzer = new WhitespaceAnalyzer();

       IndexWriter writer;

       try {
           writer = new IndexWriter(directory, analyzer, true);

           Document doc = new Document();
           doc.add(new Field("allFields",
                   "french beast stone",
                   Field.Store.NO, Field.Index.TOKENIZED));

           writer.addDocument(doc);


           doc = new Document();
           doc.add(new Field("allFields", "crazy rolling stone",
                   Field.Store.NO, Field.Index.TOKENIZED));
           writer.addDocument(doc);

doc = new Document();doc.add(new Field("allFields", "rolling stone done in bycoconut",

                   Field.Store.NO, Field.Index.TOKENIZED));
           writer.addDocument(doc);


           writer.close();
       } catch (IOException e) {
           // TODO Auto-generated catch block
           e.printStackTrace();
       }
   }

public static int search(String q) throws Exception {

       IndexSearcher is = new IndexSearcher(directory);

       QueryParser qp = new QueryParser("allFields", analyzer);

qp.setDefaultOperator(Operator.AND);Query query = qp.parse(q);long start = new Date().getTime();

       Hits hits = is.search(query);
       long end = new Date().getTime();
       System.err.println("\nquery: " + query.toString());
       System.err.println("Found " + hits.length() + " document(s) (in " +

(end - start) + " milliseconds) that matched query '" + q +"':");return hits.length();

   }
}

Erick Erickson wrote:

Are you really, really sure that your *analyzer* isn't automatically

lower-casing your *query* and turning "french AND antiques" into"french and

antiques", then, as Chris says, treating "and" as a stop word?

The fact that your parser transforms "antiques" into "antiqu" leads me to
suspect that there's a lot more going on in the parser analyzer than you
might expect....

And, in case you haven't already found it, are you sure what your index

contains. I've found luke (google luke lucene) to be very valuable forthese

kinds of questions, particularly your issue about stemming etc.

Best
Erick

On 9/17/06, no spam <[EMAIL PROTECTED]> wrote:


When I use "french AND antiques" I get documents like this :

score: 1.0, boost: 1.0, cont: French Antiques
score: 0.23080501, boost: 1.0, cont: FRENCH SEPTIC
score: 0.23080501, boost: 1.0, cont: French & French Septic
score: 0.20400475, boost: 1.0,id: 25460, cont: French & Associates

As in the first e-mail the Query object shows these terms:

contents:french contents:antiqu <---- using string "french ANDantiques"


when using Operator.AND it shows these:

+contents:french +contents:antiqu      <----- here I used used "french
antiques"

The second example below matches NONE of the documents above and in fact
only if I do synonym expansion with stemming.

*****My big question here is why doesn't the operator.AND force both of

these queries to be identical? These will be users typed queries so Iwant

Lucene to force the use of AND so I don't have to search/replace


On 9/16/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>

> can you be more specific about what it is you "expect", and whatexactly

> serachTerms is in your examples?  (presumably it's a string, is it the
> string "french AND antiques" ... are you sure it's not "french and
> antiques" ? ... QueryParser only respects AND and OR if they are
> capitalized, otherwise they are treated as normal words, which are
> probably StopWords to your analyzer .. in which case everything you've
> shown makes perfect sense to me.)
>
>
> :
> :   stemParser = new QueryParser("contents", stemmingAnalyzer);
> :   Query query = stemParser.parse(searchTerms);
> :   Hits docHits = searcher.search(query);
> :
> : Debug from query shows: contents:french contents:antiqu  ... I would
> have
> : expected to see '+' before contents.
> :

> : But not if I try the query again with "french antiques" with thiscode

> ...
> : which sets the default operator to AND:
> :
> :    stemParser = new QueryParser("contents", stemmingAnalyzer);
> :   stemParser.setDefaultOperator(QueryParser.Operator.AND);
> :   Query query = stemParser.parse(searchTerms);
> :   Hits docHits = searcher.search(query);
> :
> : Debug from Query shows this:  +contents:french +contents:antiqu
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: default AND operator

Reply via email to