I finished indexing the same dataset in an index with only one shard.
$ curl 'http://localhost:9200/52b1e8c1f8b9d73130000004/_search?pretty=true'
-d '{
"size": 0,
"facets": {
"participants": {
"terms": {
"field": "actor.displayName",
"size": 10
}
}
},
"aggs": {
"participants": {
"terms": {
"field": "actor.displayName",
"size": 10
}
}
}
}'
{
"took" : 1377,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1060387,
"max_score" : 0.0,
"hits" : [ ]
},
"facets" : {
"participants" : {
"_type" : "terms",
"missing" : 0,
"total" : 1129848,
"other" : 1111270,
"terms" : [ {
"term" : "totaltrafficbos",
"count" : 3599
}, {
"term" : "mai93thm",
"count" : 2517
}, {
"term" : "mai95thm",
"count" : 2207
}, {
"term" : "mai90thm",
"count" : 2207
}, {
"term" : "totaltrafficnyc",
"count" : 1660
}, {
"term" : "confessions",
"count" : 1534
}, {
"term" : "incidentreports",
"count" : 1468
}, {
"term" : "nji80thm",
"count" : 1180
}, {
"term" : "pai76thm",
"count" : 1142
}, {
"term" : "txi35thm",
"count" : 1064
} ]
}
},
"aggregations" : {
"participants" : {
"buckets" : [ {
"key" : "totaltrafficbos",
"doc_count" : 3599
}, {
"key" : "mai93thm",
"doc_count" : 2517
}, {
"key" : "mai90thm",
"doc_count" : 2207
}, {
"key" : "mai95thm",
"doc_count" : 2207
}, {
"key" : "totaltrafficnyc",
"doc_count" : 1660
}, {
"key" : "confessions",
"doc_count" : 1534
}, {
"key" : "incidentreports",
"doc_count" : 1468
}, {
"key" : "nji80thm",
"doc_count" : 1180
}, {
"key" : "pai76thm",
"doc_count" : 1142
}, {
"key" : "txi35thm",
"doc_count" : 1064
} ]
}
}
}
Now the counts and are the same as with faceting, and more important,
consistent.
Seems like the problem resides in aggs on multiple shards. How to proceed
from here?
-- Nils
On Friday, January 31, 2014 4:30:55 PM UTC+1, Nils Dijk wrote:
>
> Hi,
>
> I am tinkering with elasticsearch 1.0.0RC1 for a bit. Especially the part
> of aggregations. When looking closer to the responses of the aggregations I
> noticed the numbers fluctuated all the time.
>
> I have an index:
> shards: 10
> replicas: 0
> documents: ~1M
>
> Currently I'm not ingesting data anymore.
>
> When I try to recreate the terms facet in aggregations I came up with the
> following:
>
> {
> "size": 0,
> "facets": {
> "participants": {
> "terms": {
> "field": "actor.displayName",
> "size": 10
> }
> }
> },
> "aggs": {
> "participants": {
> "terms": {
> "field": "actor.displayName",
> "size": 10
> }
> }
> }
> }
>
>
> This should give me roundabout the top 10
> (*<https://github.com/elasticsearch/elasticsearch/issues/1305>)
> occurring terms in the 'actor.displayName' field. The terms facet gives the
> same counts over and over again, which is what is expected. However, the
> counts from the aggregations return different numbers every time I invoke
> it. Results of 3 consecutive runs:
> https://gist.github.com/thanodnl/8733837.
>
> Currently I'm reindexing all the documents in an index with only one shard
> to see if that makes a difference.
> This would only solve the problem short term, but our production load is
> too big to fit in one shard.
>
> -- Nils
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e2e84dc5-cd11-476c-90b4-a0aa5e0fdd72%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.