[
https://issues.apache.org/jira/browse/LUCENE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602299#comment-16602299
]
Adrien Grand commented on LUCENE-8340:
--------------------------------------
So I went back to this patch and did some testing. I played with the
wikimedium10m dataset and the following query (note that I had to do a hack to
also index "lastModNDV" with a LongPoint):
{code:java}
Query boostedQ = new BooleanQuery.Builder()
.add(new TermQuery(new Term("body", "ref")), Occur.MUST)
.add(LongPoint.newDistanceFeatureQuery("lastModNDV", 1f,
1335997132000L, 24 * 3600 * 1000), Occur.SHOULD) // within 1 day
.build();
{code}
The maximum score of the term query is 2.07. The maximum score of the distance
query is 1, and there are 582,764 documents whose timestamp is in
[1335997132000L - 24 * 3600 * 1000, 1335997132000L + 24 * 3600 * 1000], meaning
their score is in [0.5, 1].
When computing the top 10 matches and counting hits, all 3793973 hits must be
visited and points are never read. This takes about 99ms.
When computing the top 10 matches but not counting hits (totalHitsThreshold=1),
only 264802 hits are collected (7% of matches) and the query runs in 29ms.
If I switch to more costly queries that have fewer hits then the speed up
decreases, or even becomes a slowdown unfortunately. That said I don't think it
should prevent us from adding something like that, which is a useful addition
to the scoring toolbox.
> Allow to boost by recency
> -------------------------
>
> Key: LUCENE-8340
> URL: https://issues.apache.org/jira/browse/LUCENE-8340
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-8340.patch
>
>
> I would like that we support something like
> \{{FeatureField.newSaturationQuery}} but that works with features that are
> computed dynamically like recency or geo-distance, and is still optimized for
> top-hits collection. I'm starting with recency because it makes things a bit
> easier even though I suspect that geo-distance might be a more common need.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]