Hi, The ES guide states, that when computing score, *"The same query normalization factor is applied to every document*" - viz http://www.elastic.co/guide/en/elasticsearch/guide/master/practical-scoring-function.html#query-norm But when I try this example:
curl -s -XDELETE 'localhost:9200/ttt' curl -s -XPUT 'http://localhost:9200/ttt/tweet/1?refresh=true' -d '{ "user" : "a b" }' curl -s -XPUT 'http://localhost:9200/ttt/tweet/2?refresh=true' -d '{ "user" : "b c" }' curl -s -XGET 'localhost:9200/ttt/_search?explain=trye&format=yaml' -d ' { "query": { "match" : { "user" : "a b" } } } ' I got this result - I highlighted the interesting parts: took: 5 timed_out: false _shards: total: 5 successful: 5 failed: 0 hits: total: 2 max_score: 0.2712221 hits: - _shard: 2 _node: "_baxQafwQ0WyAAZpyIv2ow" _index: "ttt" _type: "tweet" _id: "1" _score: 0.2712221 _source: user: "a b" _explanation: value: 0.27122214 description: "sum of:" details: - value: 0.13561107 description: "weight(user:a in 0) [PerFieldSimilarity], result of:" details: - value: 0.13561107 description: "score(doc=0,freq=1.0), product of:" details: - value: 0.70710677 description: "queryWeight, product of:" details: - value: 0.30685282 description: "idf(docFreq=1, maxDocs=1)" - value: 2.3043842 description: "queryNorm" - value: 0.19178301 description: "fieldWeight in 0, product of:" details: - value: 1.0 description: "tf(freq=1.0), with freq of:" details: - value: 1.0 description: "termFreq=1.0" - value: 0.30685282 description: "idf(docFreq=1, maxDocs=1)" - value: 0.625 description: "fieldNorm(doc=0)" - value: 0.13561107 description: "weight(user:b in 0) [PerFieldSimilarity], result of:" details: - value: 0.13561107 description: "score(doc=0,freq=1.0), product of:" details: - value: 0.70710677 description: "queryWeight, product of:" details: - value: 0.30685282 description: "idf(docFreq=1, maxDocs=1)" - value: 2.3043842 description: "queryNorm" - value: 0.19178301 description: "fieldWeight in 0, product of:" details: - value: 1.0 description: "tf(freq=1.0), with freq of:" details: - value: 1.0 description: "termFreq=1.0" - value: 0.30685282 description: "idf(docFreq=1, maxDocs=1)" - value: 0.625 description: "fieldNorm(doc=0)" - _shard: 3 _node: "_baxQafwQ0WyAAZpyIv2ow" _index: "ttt" _type: "tweet" _id: "2" _score: 0.028130025 _source: user: "b c" _explanation: value: 0.028130027 description: "product of:" details: - value: 0.056260053 description: "sum of:" details: - value: 0.056260053 description: "weight(user:b in 0) [PerFieldSimilarity], result of:" details: - value: 0.056260053 description: "score(doc=0,freq=1.0), product of:" details: - value: 0.29335263 description: "queryWeight, product of:" details: - value: 0.30685282 description: "idf(docFreq=1, maxDocs=1)" - value: 0.9560043 description: "queryNorm" - value: 0.19178301 description: "fieldWeight in 0, product of:" details: - value: 1.0 description: "tf(freq=1.0), with freq of:" details: - value: 1.0 description: "termFreq=1.0" - value: 0.30685282 description: "idf(docFreq=1, maxDocs=1)" - value: 0.625 description: "fieldNorm(doc=0)" - value: 0.5 description: "coord(1/2)" For the document, where only one term matches, the queryNorm is cca 2.5 times smaller than at document where both terms match. The result is too much penalty for documents matching only one term. I can see the same behaviour when using "bool" query with two "should" clauses. Is this a bug? Or what is the explanation of this behaviour? Where can I find more info? Thank you for help -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3a3953d-9f01-4e78-acf9-44fd95251b81%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
