Alon Lanyado created SOLR-5038:
----------------------------------

             Summary: Diversity Search Result In Rank
                 Key: SOLR-5038
                 URL: https://issues.apache.org/jira/browse/SOLR-5038
             Project: Solr
          Issue Type: New Feature
          Components: MoreLikeThis, SearchComponents - other
         Environment: irelevant
            Reporter: Alon Lanyado


We would like to add a Diversity SearchComponent/RequestHandler for Solr.
We will implement MMR(Maximal Marginal Relevance) which is one of the simplest 
algorithms for this problem, in the next version we will improve it.

The Idea is that you have a lot of similar documents in your search result 
(duplicates and near-duplicates that you must index) and the rank is showing 
all those documents one by one - it's a very common problem for organizations.

We need to return a bigger list of documents from the searcher (a parameter 
need to be chosen based on system performance) run MMR calculation in their 
scoring:
lamda * OldRank + (1-lamda)*min_similarity{similarity of current document to 
the subset of documents already chosen to return in search results}

lamda is parameter between 0-1 - the strong of the diversity.
min_similarity is calculated based on lucene default similarity (TF-IDF) for 
the subset of already chosen documents. 
The new score will represent a combination of relevance score and diversity 
from other documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to