Hi,
The ES guide states, that when computing score, *"The same query 
normalization factor is applied to every document*" - viz 
http://www.elastic.co/guide/en/elasticsearch/guide/master/practical-scoring-function.html#query-norm
But when I try this example:

curl -s -XDELETE 'localhost:9200/ttt'

curl -s -XPUT 'http://localhost:9200/ttt/tweet/1?refresh=true' -d '{
    "user" : "a b"
}'
curl -s -XPUT 'http://localhost:9200/ttt/tweet/2?refresh=true' -d '{
    "user" : "b c"
}'

curl -s -XGET 'localhost:9200/ttt/_search?explain=trye&format=yaml' -d '
{
    "query": {
        "match" : { "user" : "a b" }
    }
}
'

I got this result - I highlighted the interesting parts:

took: 5
timed_out: false
_shards:
  total: 5
  successful: 5
  failed: 0
hits:
  total: 2
  max_score: 0.2712221
  hits:
  - _shard: 2
    _node: "_baxQafwQ0WyAAZpyIv2ow"
    _index: "ttt"
    _type: "tweet"
    _id: "1"
    _score: 0.2712221
    _source:
      user: "a b"
    _explanation:
      value: 0.27122214
      description: "sum of:"
      details:
      - value: 0.13561107
        description: "weight(user:a in 0) [PerFieldSimilarity], result of:"
        details:
        - value: 0.13561107
          description: "score(doc=0,freq=1.0), product of:"
          details:
          - value: 0.70710677
            description: "queryWeight, product of:"
            details:
            - value: 0.30685282
              description: "idf(docFreq=1, maxDocs=1)"
            - value: 2.3043842
              description: "queryNorm"
          - value: 0.19178301
            description: "fieldWeight in 0, product of:"
            details:
            - value: 1.0
              description: "tf(freq=1.0), with freq of:"
              details:
              - value: 1.0
                description: "termFreq=1.0"
            - value: 0.30685282
              description: "idf(docFreq=1, maxDocs=1)"
            - value: 0.625
              description: "fieldNorm(doc=0)"
      - value: 0.13561107
        description: "weight(user:b in 0) [PerFieldSimilarity], result of:"
        details:
        - value: 0.13561107
          description: "score(doc=0,freq=1.0), product of:"
          details:
          - value: 0.70710677
            description: "queryWeight, product of:"
            details:
            - value: 0.30685282
              description: "idf(docFreq=1, maxDocs=1)"
            - value: 2.3043842
              description: "queryNorm"
          - value: 0.19178301
            description: "fieldWeight in 0, product of:"
            details:
            - value: 1.0
              description: "tf(freq=1.0), with freq of:"
              details:
              - value: 1.0
                description: "termFreq=1.0"
            - value: 0.30685282
              description: "idf(docFreq=1, maxDocs=1)"
            - value: 0.625
              description: "fieldNorm(doc=0)"
  - _shard: 3
    _node: "_baxQafwQ0WyAAZpyIv2ow"
    _index: "ttt"
    _type: "tweet"
    _id: "2"
    _score: 0.028130025
    _source:
      user: "b c"
    _explanation:
      value: 0.028130027
      description: "product of:"
      details:
      - value: 0.056260053
        description: "sum of:"
        details:
        - value: 0.056260053
          description: "weight(user:b in 0) [PerFieldSimilarity], result 
of:"
          details:
          - value: 0.056260053
            description: "score(doc=0,freq=1.0), product of:"
            details:
            - value: 0.29335263
              description: "queryWeight, product of:"
              details:
              - value: 0.30685282
                description: "idf(docFreq=1, maxDocs=1)"
              - value: 0.9560043
                description: "queryNorm"
            - value: 0.19178301
              description: "fieldWeight in 0, product of:"
              details:
              - value: 1.0
                description: "tf(freq=1.0), with freq of:"
                details:
                - value: 1.0
                  description: "termFreq=1.0"
              - value: 0.30685282
                description: "idf(docFreq=1, maxDocs=1)"
              - value: 0.625
                description: "fieldNorm(doc=0)"
      - value: 0.5
        description: "coord(1/2)"


For the document, where only one term matches, the queryNorm is cca 2.5 
times smaller than at document where both terms match. The result is too 
much penalty for documents matching only one term.
I can see the same behaviour when using "bool" query with two "should" 
clauses.

Is this a bug? Or what is the explanation of this behaviour? Where can I 
find more info?

Thank you for help

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3a3953d-9f01-4e78-acf9-44fd95251b81%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to