Yeah, sorry, I was tired when I wrote that. I've seen the window sent to each shard and now that I think about it being able to set my own max value for that might be neat. It wasn't what I was thinking about at the time though.
I just happened upon a something that needs the sort parameter and i figured I should look it up and I saw that the field is loaded into memory. My concern was that it'd be possible (but useless) to construct a query that matches all documents and then ask Elasticsearch to sort them all. In effect, pulling that particular field into memory. So my question was, is there a way to limit the number of documents that need that field pulled into memory? Suppose I have a million documents per shard and the field I'm sorting on takes an average of a hundred bytes, that means I'm having to slurp 100M of stuff into memory. That isn't quick and consumes 1/300th of the heap on the node just for one shard. In my case I'd prefer to just sort the first ten thousand documents and warn them that the sorting wasn't wholly accurate. I suppose I could execute a count and if the count comes back too high then refuse to do the search at all but that seems less pleasant. I suppose I have the same feeling about faceting as well. And, yeah, I'm not being clear about what "the first" really means because I haven't really thought that part through. I did poke around the implementation and I saw that it loads the terms into memory for each segment. I didn't see where it unpins the loaded terms, though. Does it unpin them when it is done with the segment? Sorry for the rambling email, I guess I'm still tired. Nik On Fri, Jan 24, 2014 at 5:03 AM, Adrien Grand < [email protected]> wrote: > Hi Nik, > > Indeed Elasticsearch builds priority queues of size `from+size` on each > shard in order to find the top hits, which are then merged to get the > collection-wide top hits. My understanding is that you would like to be > able to configure an upper limit for the size of this priority queue, is it > correct? I think this would be a great addition! > > On Fri, Jan 24, 2014 at 9:56 AM, Nikolas Everett <[email protected]>wrote: > >> Is there a way to issue a query to Elasticsearch using the sort parameter >> that limits the number of results that are sorted either per shard or >> during the merge phase? I don't want to be able to accidentally load all >> the documents into memory but I'm ok with returning less accurate results. >> >> Nik >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0uLma%3DiLF%3DrUj6cW_%3D5Cf0Y2e87%3DmB3-WKJ_f1dRqH9A%40mail.gmail.com >> . >> For more options, visit https://groups.google.com/groups/opt_out. >> > > > > -- > Adrien Grand > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5xveNF-%3DyLtx%2BiH6-66bLjohBd9%2BJwgXEWJNXxJwRaVA%40mail.gmail.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3vdWz6GW1s88O2xEX1N5Y5cYaRT-8LxynVBSkjVZif%3Dw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
