SpanQuery and BoostingTermQuery oddities

Grant Ingersoll Wed, 05 Aug 2009 07:03:12 -0700

A BoostingTermQuery (BTQ) is a SpanQuery.

If I run:
    IndexSearcher searcher = new IndexSearcher(dir, true);

searcher.setSimilarity(payloadSimilarity);//set the similarity.Very importantBoostingTermQuery btq = new BoostingTermQuery(new Term("body","fox"));

    TopDocs topDocs = searcher.search(btq, 10);
    printResults(searcher, btq, topDocs);

I get, as expected, documents that contain "fox" with a payloadboosted higher than those containing fox without a boost. (See [1]for full code)

Output is:
Doc: doc=0 score=4.2344446
Explain: 4.234444 = (MATCH) fieldWeight(body:fox in 0), product of:
  7.071068 = (MATCH) btq, product of:
    0.70710677 = tf(phraseFreq=0.5)
    10.0 = scorePayload(...)
  1.9162908 = idf(body: fox=3)
  0.3125 = fieldNorm(field=body, doc=0)

Doc: doc=2 score=4.2344446
Explain: 4.234444 = (MATCH) fieldWeight(body:fox in 2), product of:
  7.071068 = (MATCH) btq, product of:
    0.70710677 = tf(phraseFreq=0.5)
    10.0 = scorePayload(...)
  1.9162908 = idf(body: fox=3)
  0.3125 = fieldNorm(field=body, doc=2)

Doc: doc=1 score=0.42344445
Explain: 0.42344445 = (MATCH) fieldWeight(body:fox in 1), product of:
  0.70710677 = (MATCH) btq, product of:
    0.70710677 = tf(phraseFreq=0.5)
    1.0 = scorePayload(...)
  1.9162908 = idf(body: fox=3)
  0.3125 = fieldNorm(field=body, doc=1)

However, if I then add the BTQ to a SpanNearQuery, I do not get theexpected results:

    SpanQuery[] queries = new SpanQuery[2];
    queries[0] = new BoostingTermQuery(new Term("body", "red"));
    queries[1] = new BoostingTermQuery(new Term("body", "fox"));
    SpanNearQuery near = new SpanNearQuery(queries, 2, true);
    topDocs = searcher.search(near, 10);
    printResults(searcher, near, topDocs);

Output is:
Doc: doc=0 score=0.6914818

Explain: 0.6914818 = (MATCH) fieldWeight(body:spanNear([red, fox], 2,true) in 0), product of:

  0.57735026 = tf(phraseFreq=0.33333334)
  3.8325815 = idf(body: fox=3 red=3)
  0.3125 = fieldNorm(field=body, doc=0)

Doc: doc=1 score=0.6914818

Explain: 0.6914818 = (MATCH) fieldWeight(body:spanNear([red, fox], 2,true) in 1), product of:

  0.57735026 = tf(phraseFreq=0.33333334)
  3.8325815 = idf(body: fox=3 red=3)
  0.3125 = fieldNorm(field=body, doc=1)

Doc: doc=2 score=0.6914818

Explain: 0.6914818 = (MATCH) fieldWeight(body:spanNear([red, fox], 2,true) in 2), product of:

  0.57735026 = tf(phraseFreq=0.33333334)
  3.8325815 = idf(body: fox=3 red=3)
  0.3125 = fieldNorm(field=body, doc=2)

It seems the BTQ score method is not being called. One of the mainpoints of the SpanNearQuery is that it can take in complex subclauses,presumably rolling up scores from the subclauses. Yet that appears tonot be the case. Instead it just seems to rely on the matches thatget produced by those subclauses, but not the scoring. Is myunderstanding correct? If so, is that the correct functionality?

I'm not a spans expert (SpanNearQuery always confuses me with theNearSpansOrdered/Unordered), but it seems like the SpanNearQuery (andlikely others that take clauses) needs to create a QueryWeight objectthat is made up of the QueryWeight objects from it's subclauses, right?


Thoughts?

Thanks,
Grant

[1]
import junit.framework.TestCase;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.WhitespaceTokenizer;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.payloads.DelimitedPayloadTokenFilter;
import org.apache.lucene.analysis.payloads.PayloadEncoder;
import org.apache.lucene.analysis.payloads.FloatEncoder;
import org.apache.lucene.analysis.payloads.PayloadHelper;
import org.apache.lucene.search.DefaultSimilarity;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.payloads.BoostingTermQuery;
import org.apache.lucene.search.spans.SpanNearQuery;
import org.apache.lucene.search.spans.SpanQuery;

import java.io.Reader;
import java.io.IOException;


/**
 *
 *
 **/
public class PayloadTest extends TestCase {
Directory dir;



  public static String[] DOCS = {

          "Mary had a little lamb whose fleece was white as snow",

  };
  protected PayloadSimilarity payloadSimilarity;

  @Override
  protected void setUp() throws Exception {
    dir = new RAMDirectory();

    PayloadEncoder encoder = new FloatEncoder();

IndexWriter writer = new IndexWriter(dir, newPayloadAnalyzer(encoder), true, IndexWriter.MaxFieldLength.UNLIMITED);

    payloadSimilarity = new PayloadSimilarity();
    writer.setSimilarity(payloadSimilarity);
    for (int i = 0; i < DOCS.length; i++) {
      Document doc = new Document();

Field id = new Field("id", "doc_" + i, Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS);

      doc.add(id);
      //Store both position and offset information

Field text = new Field("body", DOCS[i], Field.Store.NO,Field.Index.ANALYZED);

      doc.add(text);
      writer.addDocument(doc);
    }
    writer.close();
  }


  public void testPayloads() throws Exception {
    IndexSearcher searcher = new IndexSearcher(dir, true);

searcher.setSimilarity(payloadSimilarity);//set the similarity.Very importantBoostingTermQuery btq = new BoostingTermQuery(new Term("body","fox"));

    TopDocs topDocs = searcher.search(btq, 10);
    printResults(searcher, btq, topDocs);
    System.out.println("-----------");
    System.out.println("Try out some Spans");
    SpanQuery[] queries = new SpanQuery[2];
    queries[0] = new BoostingTermQuery(new Term("body", "red"));
    queries[1] = new BoostingTermQuery(new Term("body", "fox"));
    SpanNearQuery near = new SpanNearQuery(queries, 2, true);
    topDocs = searcher.search(near, 10);
    printResults(searcher, near, topDocs);

  }

private void printResults(IndexSearcher searcher, Query btq,TopDocs topDocs) throws IOException {

    for (int i = 0; i < topDocs.scoreDocs.length; i++) {
      ScoreDoc doc = topDocs.scoreDocs[i];
      System.out.println("Doc: " + doc.toString());
      System.out.println("Explain: " + searcher.explain(btq, doc.doc));
    }
  }

  class PayloadSimilarity extends DefaultSimilarity {
    @Override

public float scorePayload(String fieldName, byte[] bytes, intoffset, int length) {return PayloadHelper.decodeFloat(bytes, offset);//we can ignorelength here, because we know it is encoded as 4 bytes

    }
  }

  class PayloadAnalyzer extends Analyzer {
    private PayloadEncoder encoder;

    PayloadAnalyzer(PayloadEncoder encoder) {
      this.encoder = encoder;
    }

    public TokenStream tokenStream(String fieldName, Reader reader) {
      TokenStream result = new WhitespaceTokenizer(reader);
      result = new LowerCaseFilter(result);
      result = new DelimitedPayloadTokenFilter(result, '|', encoder);
      return result;
    }
  }
}


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

SpanQuery and BoostingTermQuery oddities

Reply via email to