Christoffer, How much JVM heap are you giving ES and what are the size of the sets? According to this http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html it looks like in 1.4 you will be able to control the circuit breaker more via config. However, depending on your data set size I am guessing you are still going to have to worry what you can allocate to the ES heap since that page seems to indicate the circuit breakers are defaulted to reasonably high %.
I am trying to look into the scalability characteristics of this feature myself because it is iterating for some goals I have, but I don't see any information about how it scales or what it is bound by. In my case I would like to be able to analyse foreground sets of 10s to 100s of thousands of documents against a bg set of millions. Without finding anything documented your #s might give me an idea if my use is crazy or reasonable prior getting some testing done with it. Kevin On Friday, September 5, 2014 3:19:13 AM UTC-5, Christoffer Vig wrote: > > The significant terms aggregation is a really great feature that allows > for some really interesting data analysis. We quite often experience out of > memory errors, "CircuitBreakingException: Data too large, data would be > larger than limit" > Which is not hard to understand, due to the amount of data and the speed > requirements. > > I think it would be interesting if it was possible to "trade off" speed to > allow deeper analysis. To run significant terms, and possibly other > aggregations, allow them to run for as long as needed, just to return some > (presumably correct) results. > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00fb6efa-e869-4672-afd6-673c995f1506%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
