Yeah, sorry, I was tired when I wrote that.  I've seen the window sent to
each shard and now that I think about it being able to set my own max value
for that might be neat.  It wasn't what I was thinking about at the time
though.

I just happened upon a something that needs the sort parameter and i
figured I should look it up and I saw that the field is loaded into
memory.  My concern was that it'd be possible (but useless) to construct a
query that matches all documents and then ask Elasticsearch to sort them
all.  In effect, pulling that particular field into memory.  So my question
was, is there a way to limit the number of documents that need that field
pulled into memory?

Suppose I have a million documents per shard and the field I'm sorting on
takes an average of a hundred bytes, that means I'm having to slurp 100M of
stuff into memory.  That isn't quick and consumes 1/300th of the heap on
the node just for one shard.  In my case I'd prefer to just sort the first
ten thousand documents and warn them that the sorting wasn't wholly
accurate.  I suppose I could execute a count and if the count comes back
too high then refuse to do the search at all but that seems less pleasant.
I suppose I have the same feeling about faceting as well.  And, yeah, I'm
not being clear about what "the first" really means because I haven't
really thought that part through.

I did poke around the implementation and I saw that it loads the terms into
memory for each segment.  I didn't see where it unpins the loaded terms,
though. Does it unpin them when it is done with the segment?

Sorry for the rambling email, I guess I'm still tired.

Nik



On Fri, Jan 24, 2014 at 5:03 AM, Adrien Grand <
[email protected]> wrote:

> Hi Nik,
>
> Indeed Elasticsearch builds priority queues of size `from+size` on each
> shard in order to find the top hits, which are then merged to get the
> collection-wide top hits. My understanding is that you would like to be
> able to configure an upper limit for the size of this priority queue, is it
> correct? I think this would be a great addition!
>
> On Fri, Jan 24, 2014 at 9:56 AM, Nikolas Everett <[email protected]>wrote:
>
>> Is there a way to issue a query to Elasticsearch using the sort parameter
>> that limits the number of results that are sorted either per shard or
>> during the merge phase?  I don't want to be able to accidentally load all
>> the documents into memory but I'm ok with returning less accurate results.
>>
>> Nik
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0uLma%3DiLF%3DrUj6cW_%3D5Cf0Y2e87%3DmB3-WKJ_f1dRqH9A%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> --
> Adrien Grand
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5xveNF-%3DyLtx%2BiH6-66bLjohBd9%2BJwgXEWJNXxJwRaVA%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3vdWz6GW1s88O2xEX1N5Y5cYaRT-8LxynVBSkjVZif%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to