Hey Travis,

Unfortunately because this as more than 3 months ago my brain has 
determined it is no longer relevant and hence all the information has 
seeped out of my ears and into the trash.

That being said, I don't think we ever "solved" the problem.  The project 
this was for was ultimately canned so it never went into production and 
hence never got the care and attention it needed.

If I *had* continued to develop the system, I *think* I may have attempted 
my own bounding box criteria that did simple numerical ranges.  No doubt I 
would have also subsequently realized that computing the coordinates of a 
"box" as a set of spherical coordinates is harder than it looks and got it 
completely wrong.. but I'd have had a go.

On Wednesday, January 15, 2014 4:26:52 PM UTC-8, Travis Bullock wrote:
>
> Hi Jason,
> Did you end up implementing this using range filters or was there another 
> solution? I've just started looking at ElasticSearch and am equally stumped 
> at how bounding_box shouldn't be much faster.
>
> On Monday, May 13, 2013 6:43:15 PM UTC-5, Jason wrote:
>>
>> The server is almost certainly CPU bound and disk IO bound, although this 
>> is just tacit information gleaned from running "top".  It's not a huge 
>> server and is really just for prototyping the solution, so I get that the 
>> resource limitations of the server would be hindering search performance. 
>>  What I still don't understand is why bounding box queries are 
>> significantly slower than range queries.  I understand from previous 
>> investigations that geo_distance queries are more resource intensive due to 
>> the need to load all locations into memory then performing operations on 
>> this set (which is going to hammer the CPU) but I *assumed* that 
>> bounding box queries would not do this and would instead simply manifest as 
>> a set of range queries that treat lat/lon values as simply floating point 
>> numeric values.  If I do a simply range query on other numerical values on 
>> the documents I get query performance times that are not significantly 
>> different to queries with other similar (simple) filters.  Whereas if I do 
>> a bounding_box filter I see similar performance as when running a 
>> geo_distance query.  Actually *extremely *similar.  On a data set where 
>> a range query takes 500ms and a geo_distance takes >= 20,000ms an 
>> equivalent bounding box query takes >= 20,0000ms, which makes me think it's 
>> doing a similar set of processes as done by the geo_distance query.  This 
>> is the confusing point.
>>
>> Geo Distance as far as I can determine *needs* to load values into 
>> memory because there is no automatic way of knowing the linear distance 
>> between two points without computing it.  But a bounding box is surely just 
>> a range.  If the lat/lon of the document is *within* the bounding box as 
>> determined by a series of >=,<= conditions then one would think it matches 
>> the query.  Naturally there would need to be some consideration for 
>> "wrapping" values around equatorial/meridian values but again this should 
>> just be an *OR* clause.
>>
>> Clearly there is something missing in my understanding of how bounding 
>> box queries work and I guess I just wanted to understand how ES had 
>> implemented this.  I *am* sure that the ES peeps know what they're doing 
>> and there is likely to be a very good reason why I'm talking out of my ass.
>>
>> On Monday, May 13, 2013 4:27:10 PM UTC-7, Brian Gadoury wrote:
>>>
>>> On Monday, May 13, 2013 4:53:40 PM UTC-6, Jason wrote:
>>>>
>>>> So, the standard answer is "get more servers", which seems to be what 
>>>> you're saying and that makes complete sense.  I guess I was curious about 
>>>> why ES would need so much more than a simple lucene index which could 
>>>> handle at least this amount without any problems, and why bounding box 
>>>> queries specifically would be slow considering I assumed they were just a 
>>>> set of simple numerical ranges.  But I guess what I should *really* do 
>>>> is just say, "listen.. you can't build elastic search so stop bitching and 
>>>> just do what they tell you to".
>>>>
>>>
>>>  Ultimately, yes. I think you need more horsepower. I can't say why, 
>>> which I understand is the crux of your frustration here.
>>>
>>> You're still flying blind (and somewhat limiting the amount of help this 
>>> group can give you) if you aren't monitoring your server stats and figuring 
>>> out where your bottleneck is. You can have BigDesk running as a plugin with 
>>> 1 command and a service restart. It'll take less than 60 seconds to do.
>>>
>>> -Phunk
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/68ff4ec8-a2ae-42e3-bfa7-3178b87b2686%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to