Alon Lanyado created SOLR-5038:
----------------------------------
Summary: Diversity Search Result In Rank
Key: SOLR-5038
URL: https://issues.apache.org/jira/browse/SOLR-5038
Project: Solr
Issue Type: New Feature
Components: MoreLikeThis, SearchComponents - other
Environment: irelevant
Reporter: Alon Lanyado
We would like to add a Diversity SearchComponent/RequestHandler for Solr.
We will implement MMR(Maximal Marginal Relevance) which is one of the simplest
algorithms for this problem, in the next version we will improve it.
The Idea is that you have a lot of similar documents in your search result
(duplicates and near-duplicates that you must index) and the rank is showing
all those documents one by one - it's a very common problem for organizations.
We need to return a bigger list of documents from the searcher (a parameter
need to be chosen based on system performance) run MMR calculation in their
scoring:
lamda * OldRank + (1-lamda)*min_similarity{similarity of current document to
the subset of documents already chosen to return in search results}
lamda is parameter between 0-1 - the strong of the diversity.
min_similarity is calculated based on lucene default similarity (TF-IDF) for
the subset of already chosen documents.
The new score will represent a combination of relevance score and diversity
from other documents.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]