Hi Hannes,

thanks for the info. Scoring like mutual information sound fun.

Best regards,
Valentin

On Friday, April 11, 2014 7:40:14 PM UTC+2, Hannes Korte wrote:
>
> Hi Valentin, 
>
> > - What is bg_count (I assume background count) but what is the meaning 
> of 
> > it? 
>
> The bg_count is the number of documents, which contain the term in the 
> whole index (not just in the search result). 
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html
>  
>
> > - At first I thought the score values are between 0 and 1 but there are 
> > much bigger values. Can anyone give me a rough explanation? 
>
> You can see the code of the computation here: 
>
>
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/search/aggregations/bucket/significant/InternalSignificantTerms.java#L94
>  
>
> This is a summarized version of the formula: 
>
>    double subsetProb = #relative frequency in the search result#; 
>    double supersetProb = #relative frequency in the whole index#; 
>    double absoluteProbChange = subsetProb - supersetProb; 
>    if (absoluteProbChange <= 0) { 
>      return 0; 
>    } 
>    double relativeProbChange = (subsetProb / supersetProb); 
>    return absoluteProbChange * relativeProbChange; 
>
> I guess in the future there will be support for other scorings like 
> mutual information, chi squared or information gain. 
>
> Best regards, 
> Hannes 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2490b9bd-4531-4964-9f21-6e18d2a92c7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to