Hi Jason,
Did you end up implementing this using range filters or was there another 
solution? I've just started looking at ElasticSearch and am equally stumped 
at how bounding_box shouldn't be much faster.

On Monday, May 13, 2013 6:43:15 PM UTC-5, Jason wrote:
>
> The server is almost certainly CPU bound and disk IO bound, although this 
> is just tacit information gleaned from running "top".  It's not a huge 
> server and is really just for prototyping the solution, so I get that the 
> resource limitations of the server would be hindering search performance. 
>  What I still don't understand is why bounding box queries are 
> significantly slower than range queries.  I understand from previous 
> investigations that geo_distance queries are more resource intensive due to 
> the need to load all locations into memory then performing operations on 
> this set (which is going to hammer the CPU) but I *assumed* that bounding 
> box queries would not do this and would instead simply manifest as a set of 
> range queries that treat lat/lon values as simply floating point numeric 
> values.  If I do a simply range query on other numerical values on the 
> documents I get query performance times that are not significantly 
> different to queries with other similar (simple) filters.  Whereas if I do 
> a bounding_box filter I see similar performance as when running a 
> geo_distance query.  Actually *extremely *similar.  On a data set where a 
> range query takes 500ms and a geo_distance takes >= 20,000ms an equivalent 
> bounding box query takes >= 20,0000ms, which makes me think it's doing a 
> similar set of processes as done by the geo_distance query.  This is the 
> confusing point.
>
> Geo Distance as far as I can determine *needs* to load values into memory 
> because there is no automatic way of knowing the linear distance between 
> two points without computing it.  But a bounding box is surely just a 
> range.  If the lat/lon of the document is *within* the bounding box as 
> determined by a series of >=,<= conditions then one would think it matches 
> the query.  Naturally there would need to be some consideration for 
> "wrapping" values around equatorial/meridian values but again this should 
> just be an *OR* clause.
>
> Clearly there is something missing in my understanding of how bounding box 
> queries work and I guess I just wanted to understand how ES had implemented 
> this.  I *am* sure that the ES peeps know what they're doing and there is 
> likely to be a very good reason why I'm talking out of my ass.
>
> On Monday, May 13, 2013 4:27:10 PM UTC-7, Brian Gadoury wrote:
>>
>> On Monday, May 13, 2013 4:53:40 PM UTC-6, Jason wrote:
>>>
>>> So, the standard answer is "get more servers", which seems to be what 
>>> you're saying and that makes complete sense.  I guess I was curious about 
>>> why ES would need so much more than a simple lucene index which could 
>>> handle at least this amount without any problems, and why bounding box 
>>> queries specifically would be slow considering I assumed they were just a 
>>> set of simple numerical ranges.  But I guess what I should *really* do 
>>> is just say, "listen.. you can't build elastic search so stop bitching and 
>>> just do what they tell you to".
>>>
>>
>>  Ultimately, yes. I think you need more horsepower. I can't say why, 
>> which I understand is the crux of your frustration here.
>>
>> You're still flying blind (and somewhat limiting the amount of help this 
>> group can give you) if you aren't monitoring your server stats and figuring 
>> out where your bottleneck is. You can have BigDesk running as a plugin with 
>> 1 command and a service restart. It'll take less than 60 seconds to do.
>>
>> -Phunk
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/444d5ee6-4691-44d5-9506-b4b5e54f8373%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to