Hi Jason, Did you end up implementing this using range filters or was there another solution? I've just started looking at ElasticSearch and am equally stumped at how bounding_box shouldn't be much faster.
On Monday, May 13, 2013 6:43:15 PM UTC-5, Jason wrote: > > The server is almost certainly CPU bound and disk IO bound, although this > is just tacit information gleaned from running "top". It's not a huge > server and is really just for prototyping the solution, so I get that the > resource limitations of the server would be hindering search performance. > What I still don't understand is why bounding box queries are > significantly slower than range queries. I understand from previous > investigations that geo_distance queries are more resource intensive due to > the need to load all locations into memory then performing operations on > this set (which is going to hammer the CPU) but I *assumed* that bounding > box queries would not do this and would instead simply manifest as a set of > range queries that treat lat/lon values as simply floating point numeric > values. If I do a simply range query on other numerical values on the > documents I get query performance times that are not significantly > different to queries with other similar (simple) filters. Whereas if I do > a bounding_box filter I see similar performance as when running a > geo_distance query. Actually *extremely *similar. On a data set where a > range query takes 500ms and a geo_distance takes >= 20,000ms an equivalent > bounding box query takes >= 20,0000ms, which makes me think it's doing a > similar set of processes as done by the geo_distance query. This is the > confusing point. > > Geo Distance as far as I can determine *needs* to load values into memory > because there is no automatic way of knowing the linear distance between > two points without computing it. But a bounding box is surely just a > range. If the lat/lon of the document is *within* the bounding box as > determined by a series of >=,<= conditions then one would think it matches > the query. Naturally there would need to be some consideration for > "wrapping" values around equatorial/meridian values but again this should > just be an *OR* clause. > > Clearly there is something missing in my understanding of how bounding box > queries work and I guess I just wanted to understand how ES had implemented > this. I *am* sure that the ES peeps know what they're doing and there is > likely to be a very good reason why I'm talking out of my ass. > > On Monday, May 13, 2013 4:27:10 PM UTC-7, Brian Gadoury wrote: >> >> On Monday, May 13, 2013 4:53:40 PM UTC-6, Jason wrote: >>> >>> So, the standard answer is "get more servers", which seems to be what >>> you're saying and that makes complete sense. I guess I was curious about >>> why ES would need so much more than a simple lucene index which could >>> handle at least this amount without any problems, and why bounding box >>> queries specifically would be slow considering I assumed they were just a >>> set of simple numerical ranges. But I guess what I should *really* do >>> is just say, "listen.. you can't build elastic search so stop bitching and >>> just do what they tell you to". >>> >> >> Ultimately, yes. I think you need more horsepower. I can't say why, >> which I understand is the crux of your frustration here. >> >> You're still flying blind (and somewhat limiting the amount of help this >> group can give you) if you aren't monitoring your server stats and figuring >> out where your bottleneck is. You can have BigDesk running as a plugin with >> 1 command and a service restart. It'll take less than 60 seconds to do. >> >> -Phunk >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/444d5ee6-4691-44d5-9506-b4b5e54f8373%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
