A BoostingTermQuery (BTQ) is a SpanQuery.
If I run:
IndexSearcher searcher = new IndexSearcher(dir, true);
searcher.setSimilarity(payloadSimilarity);//set the similarity.
Very important
BoostingTermQuery btq = new BoostingTermQuery(new Term("body",
"fox"));
TopDocs topDocs = searcher.search(btq, 10);
printResults(searcher, btq, topDocs);
I get, as expected, documents that contain "fox" with a payload
boosted higher than those containing fox without a boost. (See [1]
for full code)
Output is:
Doc: doc=0 score=4.2344446
Explain: 4.234444 = (MATCH) fieldWeight(body:fox in 0), product of:
7.071068 = (MATCH) btq, product of:
0.70710677 = tf(phraseFreq=0.5)
10.0 = scorePayload(...)
1.9162908 = idf(body: fox=3)
0.3125 = fieldNorm(field=body, doc=0)
Doc: doc=2 score=4.2344446
Explain: 4.234444 = (MATCH) fieldWeight(body:fox in 2), product of:
7.071068 = (MATCH) btq, product of:
0.70710677 = tf(phraseFreq=0.5)
10.0 = scorePayload(...)
1.9162908 = idf(body: fox=3)
0.3125 = fieldNorm(field=body, doc=2)
Doc: doc=1 score=0.42344445
Explain: 0.42344445 = (MATCH) fieldWeight(body:fox in 1), product of:
0.70710677 = (MATCH) btq, product of:
0.70710677 = tf(phraseFreq=0.5)
1.0 = scorePayload(...)
1.9162908 = idf(body: fox=3)
0.3125 = fieldNorm(field=body, doc=1)
However, if I then add the BTQ to a SpanNearQuery, I do not get the
expected results:
SpanQuery[] queries = new SpanQuery[2];
queries[0] = new BoostingTermQuery(new Term("body", "red"));
queries[1] = new BoostingTermQuery(new Term("body", "fox"));
SpanNearQuery near = new SpanNearQuery(queries, 2, true);
topDocs = searcher.search(near, 10);
printResults(searcher, near, topDocs);
Output is:
Doc: doc=0 score=0.6914818
Explain: 0.6914818 = (MATCH) fieldWeight(body:spanNear([red, fox], 2,
true) in 0), product of:
0.57735026 = tf(phraseFreq=0.33333334)
3.8325815 = idf(body: fox=3 red=3)
0.3125 = fieldNorm(field=body, doc=0)
Doc: doc=1 score=0.6914818
Explain: 0.6914818 = (MATCH) fieldWeight(body:spanNear([red, fox], 2,
true) in 1), product of:
0.57735026 = tf(phraseFreq=0.33333334)
3.8325815 = idf(body: fox=3 red=3)
0.3125 = fieldNorm(field=body, doc=1)
Doc: doc=2 score=0.6914818
Explain: 0.6914818 = (MATCH) fieldWeight(body:spanNear([red, fox], 2,
true) in 2), product of:
0.57735026 = tf(phraseFreq=0.33333334)
3.8325815 = idf(body: fox=3 red=3)
0.3125 = fieldNorm(field=body, doc=2)
It seems the BTQ score method is not being called. One of the main
points of the SpanNearQuery is that it can take in complex subclauses,
presumably rolling up scores from the subclauses. Yet that appears to
not be the case. Instead it just seems to rely on the matches that
get produced by those subclauses, but not the scoring. Is my
understanding correct? If so, is that the correct functionality?
I'm not a spans expert (SpanNearQuery always confuses me with the
NearSpansOrdered/Unordered), but it seems like the SpanNearQuery (and
likely others that take clauses) needs to create a QueryWeight object
that is made up of the QueryWeight objects from it's subclauses, right?
Thoughts?
Thanks,
Grant
[1]
import junit.framework.TestCase;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.WhitespaceTokenizer;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.payloads.DelimitedPayloadTokenFilter;
import org.apache.lucene.analysis.payloads.PayloadEncoder;
import org.apache.lucene.analysis.payloads.FloatEncoder;
import org.apache.lucene.analysis.payloads.PayloadHelper;
import org.apache.lucene.search.DefaultSimilarity;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.payloads.BoostingTermQuery;
import org.apache.lucene.search.spans.SpanNearQuery;
import org.apache.lucene.search.spans.SpanQuery;
import java.io.Reader;
import java.io.IOException;
/**
*
*
**/
public class PayloadTest extends TestCase {
Directory dir;
public static String[] DOCS = {
"The quick|2.0 red|2.0 fox|10.0 jumped|5.0 over the lazy|
2.0 brown|2.0 dogs|10.0",
"The quick red fox jumped over the lazy brown dogs",//no
boosts
"The quick|2.0 red|2.0 fox|10.0 jumped|5.0 over the old|2.0
brown|2.0 box|10.0",
"Mary|10.0 had a little|2.0 lamb|10.0 whose fleece|10.0 was|
5.0 white|2.0 as snow|10.0",
"Mary had a little lamb whose fleece was white as snow",
"Mary|10.0 takes on Wolf|10.0 Restoration|10.0 project|10.0
despite ties|10.0 to sheep|10.0 farming|10.0",
"Mary|10.0 who lives|5.0 on a farm|10.0 is|5.0 happy|2.0
that she|10.0 takes|5.0 a walk|10.0 every day|10.0",
"Moby|10.0 Dick|10.0 is|5.0 a story|10.0 of a whale|10.0
and a man|10.0 obsessed|10.0",
"The robber|10.0 wore|5.0 a black|2.0 fleece|10.0 jacket|
10.0 and a baseball|10.0 cap|10.0",
"The English|10.0 Springer|10.0 Spaniel|10.0 is|5.0 the
best|2.0 of all dogs|10.0"
};
protected PayloadSimilarity payloadSimilarity;
@Override
protected void setUp() throws Exception {
dir = new RAMDirectory();
PayloadEncoder encoder = new FloatEncoder();
IndexWriter writer = new IndexWriter(dir, new
PayloadAnalyzer(encoder), true, IndexWriter.MaxFieldLength.UNLIMITED);
payloadSimilarity = new PayloadSimilarity();
writer.setSimilarity(payloadSimilarity);
for (int i = 0; i < DOCS.length; i++) {
Document doc = new Document();
Field id = new Field("id", "doc_" + i, Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS);
doc.add(id);
//Store both position and offset information
Field text = new Field("body", DOCS[i], Field.Store.NO,
Field.Index.ANALYZED);
doc.add(text);
writer.addDocument(doc);
}
writer.close();
}
public void testPayloads() throws Exception {
IndexSearcher searcher = new IndexSearcher(dir, true);
searcher.setSimilarity(payloadSimilarity);//set the similarity.
Very important
BoostingTermQuery btq = new BoostingTermQuery(new Term("body",
"fox"));
TopDocs topDocs = searcher.search(btq, 10);
printResults(searcher, btq, topDocs);
System.out.println("-----------");
System.out.println("Try out some Spans");
SpanQuery[] queries = new SpanQuery[2];
queries[0] = new BoostingTermQuery(new Term("body", "red"));
queries[1] = new BoostingTermQuery(new Term("body", "fox"));
SpanNearQuery near = new SpanNearQuery(queries, 2, true);
topDocs = searcher.search(near, 10);
printResults(searcher, near, topDocs);
}
private void printResults(IndexSearcher searcher, Query btq,
TopDocs topDocs) throws IOException {
for (int i = 0; i < topDocs.scoreDocs.length; i++) {
ScoreDoc doc = topDocs.scoreDocs[i];
System.out.println("Doc: " + doc.toString());
System.out.println("Explain: " + searcher.explain(btq, doc.doc));
}
}
class PayloadSimilarity extends DefaultSimilarity {
@Override
public float scorePayload(String fieldName, byte[] bytes, int
offset, int length) {
return PayloadHelper.decodeFloat(bytes, offset);//we can ignore
length here, because we know it is encoded as 4 bytes
}
}
class PayloadAnalyzer extends Analyzer {
private PayloadEncoder encoder;
PayloadAnalyzer(PayloadEncoder encoder) {
this.encoder = encoder;
}
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new WhitespaceTokenizer(reader);
result = new LowerCaseFilter(result);
result = new DelimitedPayloadTokenFilter(result, '|', encoder);
return result;
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org