回复: really bad post_filter performance

spancer.roc.ray Mon, 31 Mar 2014 16:48:06 -0700

Too many shards may result in querying performance. 

daveey <[email protected]>编写：


>I just upgraded to ES 1.0.1 from ES 0.9.2 and am seeing huge performance 
>problems.
>
>
>I traced them to what I think is the post_filter. 
>
>
>Here is the query that we used to run against ES 0.9.2
>
> 
>
>{
>  filter": {
>
>    "and": [
>
>      {
>
>        "terms": {
>
>          "index_ids": [
>
>            2134616789944
>
>          ]
>
>        }
>
>      },
>
>      {
>
>        "or": [
>
>          {
>
>            "term": {
>
>              "trashed_at": 0
>
>            }
>
>          },
>
>          {
>
>            "not": {
>
>              "exists": {
>
>                "field": "trashed_at"
>
>              }
>
>            }
>
>          }
>
>        ]
>
>      }
>
>    ]
>
>  }
>
>}
>
>
>This used to take the 0.9 cluster about 150ms to execute
>
>The same query takes about 2.5s for the 1.0 cluster.
>
>I rewrote it to conform to my understanding of the changes in 1.0, using a 
>filtered query, however, that didn't help.
>
>I then tried to figure out which parts were slow. I now have the following 
>query
>
>{
>  "query": {
>
>    "filtered": {
>
>      "query": {
>
>        "match_all": {}
>
>      },
>
>      "filter": {
>
>            "terms": {
>
>              "index_ids": [
>
>                2134616789944
>
>              ]}
>
>      }
>
>    }
>
>  },
>
>  "post_filter": {
>
>    "or": [
>
>      {"term": {"trashed_at": 0}},
>
>      {"not": {"exists": {"field": "trashed_at"}}}
>
>      ]}
>
>}
>
>
>It takes 2.5 s and returns 34 hits. However, removing the "post_filter" clause:
>
>{
> "query": {
>
>    "filtered": {
>
>      "query": {
>
>        "match_all": {}
>
>      },
>
>      "filter": {
>
>        "terms": {
>
>          "index_ids": [
>
>            2134616789944
>
>          ]
>
>        }
>
>      }
>
>    }
>
>  }
>
>}
>
>
>Makes it take 50ms and return 34 results. 
>
>My conclusion is that it's taking 2.5 seconds to filter 34 results, and that's 
>confusing. 
>
>The cluster uses 3 machines, 50 shards, 2 replicas per shard. This means that 
>each machine has the entire copy of the index. We use the ?routing= parameter, 
>and are always hitting a single shard for the query.
>
>Help?
>
>
>-- 
>You received this message because you are subscribed to the Google Groups 
>"elasticsearch" group.
>To unsubscribe from this group and stop receiving emails from it, send an 
>email to [email protected].
>To view this discussion on the web visit 
>https://groups.google.com/d/msgid/elasticsearch/cd5b6bb1-7fce-4688-84cb-4ec6d0db8f93%40googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/73fhfrw850xcj5nhwiuwk1o8.1396309621505%40email.android.com.
For more options, visit https://groups.google.com/d/optout.

回复: really bad post_filter performance

Reply via email to