RE: How to properly correlate relevance in a search across multiple collections

Baldwin, David Mon, 08 Sep 2014 16:15:07 -0700

I am looking at the MultiSearcher, which seems to have been around for a while 
(at least since 3.0.3) and I am wondering if that will do what I want.  I just 
looked at Lucene again and it states that it searches multiple indexes with 
merged results.  I also see a lot of similar comments about scores not being 
comparable from one index to another.  I am confused.  Does anyone have any 
additional thoughts on MultiSearcher?  Reading Lucene in Action, it looks like 
it does what I want it to do


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, September 08, 2014 10:31 AM
To: java-user
Subject: Re: How to properly correlate relevance in a search across multiple 
collections

I think the point got lost in the discussion. Raw scores are simply _not_ 
comparable from different collections. They aren't even comparable for 
different queries in the _same_ collection. They are _only_ relevant for 
ranking in the same collection within a single query.

And even then raw scores don't tell you much. A score of 2 isn't "twice as 
good" as a score of 1, it's just "somewhat better".

So the bottom line is that you start resorting to some kind of clever 
presentation of the different groups to the user; tabs for each collection, 
round-robin inclusion or meta-analysis where you query the _same_ docs that 
exist in different indexes and try to create some satisfactory heuristic etc.  
as atawfik suggested.

Best,
Erick

On Mon, Sep 8, 2014 at 8:59 AM, Baldwin, David <david_bald...@bmc.com> wrote:
> Would it be possible, or does anyone have any experience, in using the raw 
> score from each separate collection to order and then after a merge come up 
> with relevancy?
>
> -----Original Message-----
> From: atawfik [mailto:contact.txl...@gmail.com]
> Sent: Sunday, September 07, 2014 9:50 AM
> To: java-user@lucene.apache.org
> Subject: Re: How to properly correlate relevance in a search across 
> multiple collections
>
> Hi,
>
> if you have documents that might exist in multiple collections, then 
> you can use techniques from meta search. That is combining multiple 
> search results from different collections. In this case, you can 
> retrieve the top 100 or
> 1000 documents from each collection and merge them. You then rank documents 
> by using some aggregation methods. It is known that using the sum of 
> relevance scores produces good results.
>
> If there are no shared documents between collections, you still can use the 
> same approach but using different aggregation methods. One method is round 
> robin. You start by selecting the first ranked document from each collection. 
> Then, taking the second ranked document and so on.
>
> If that does not fit your needs, probably you should search for "federated or 
> aggregated search techniques". These techniques are used by giant search 
> engines to combine results from their search engine parts (images,video and 
> web). You can find a lot of academic resources in these aspects.
>
> Regards
> Ameer
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-properly-correlate-relevance
> -in-a-search-across-multiple-collections-tp4157240p4157321.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: How to properly correlate relevance in a search across multiple collections

Reply via email to