Hi Hannes, thanks for the info. Scoring like mutual information sound fun.
Best regards, Valentin On Friday, April 11, 2014 7:40:14 PM UTC+2, Hannes Korte wrote: > > Hi Valentin, > > > - What is bg_count (I assume background count) but what is the meaning > of > > it? > > The bg_count is the number of documents, which contain the term in the > whole index (not just in the search result). > > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html > > > > - At first I thought the score values are between 0 and 1 but there are > > much bigger values. Can anyone give me a rough explanation? > > You can see the code of the computation here: > > > https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/search/aggregations/bucket/significant/InternalSignificantTerms.java#L94 > > > This is a summarized version of the formula: > > double subsetProb = #relative frequency in the search result#; > double supersetProb = #relative frequency in the whole index#; > double absoluteProbChange = subsetProb - supersetProb; > if (absoluteProbChange <= 0) { > return 0; > } > double relativeProbChange = (subsetProb / supersetProb); > return absoluteProbChange * relativeProbChange; > > I guess in the future there will be support for other scorings like > mutual information, chi squared or information gain. > > Best regards, > Hannes > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2490b9bd-4531-4964-9f21-6e18d2a92c7e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
