[
https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254163#comment-13254163
]
Benson Margulies commented on LUCENE-1999:
------------------------------------------
I have a potential application for this, and would be willing to work on it,
assuming that committers have any interest in committing the results.
Let me explain my particular case, which some of you may have seen discussed on
solr-users.
Imagine wanted to search for documents based on some relatively expensive
similarity metric. Too expensive, by far, to want to run on every single
document in the index, or even all the documents that pass some filter first.
Further imagine that you come up with an approximation of the similarity metric
in terms of Lucene query capabilities. The approximation is ordinary (e.g. no
Solr Functions forcing a computation on each document), and approximates by
having the same (or higher) recall than the real metric, but lower precision.
Roughly, that the top 200 hits based on the approximation will contain the top
10 hits based on the real metric.
OK, well, then, you can run this query, retrive documents, select the top hits,
and then run the real metric. You get the right answer for far lower CPU time.
And all of this works perfectly fine with Lucene (and Solr) as we know it.
However, imagine a further challenge. You want to combine the approximation
query with arbitrary other query terms -- and then fix up the scores in the top
documents to reflect the real metric.
Well, you can run a second query on just the approximation query to get its
score contribution, subtract it out, and add in (scaling here is a challenge)
the results of the real metric.
Or, it seems to me, you could use this approach here, as perhaps extended as
discussed.
?
> Match spotter for all query types
> ---------------------------------
>
> Key: LUCENE-1999
> URL: https://issues.apache.org/jira/browse/LUCENE-1999
> Project: Lucene - Java
> Issue Type: New Feature
> Components: core/search
> Affects Versions: 2.9
> Reporter: Mark Harwood
> Attachments: matchflagger.patch
>
>
> Related to LUCENE-1929 and the current inability to highlight
> NumericRangeQuery, spatial, cached term filters and other exotica.
> This patch provides the ability to wrap *any* Query objects and record match
> info as flags encoded in the overall document score.
> Using this approach it would be possible to understand (and therefore
> highlight) which fields matched clauses in a query.
> The match encoding approach loses some precision in scores as noted here:
> http://tinyurl.com/ykt8nx7
> Avoiding these precision issues would require a change to Lucene core to
> record docId, score AND a matchFlag byte in ScoreDoc objects and collector
> APIs.
> This may be something we should consider.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]