Hi Jörg, I am a colleague of Jeff's. Thank's for your help so far.
Is there any performance gain from using 'limit' in queries on clusters multiple shards across multiple nodes? Our expectation for using 'limit' was to limit the number of documents each shard needed to process, and thereby hopefully increasing performance. Then we use 'size' to ultimately limit the number of documents returned. If there is nothing to be gained from using 'limit' then perhaps we have misunderstood its purpose and the solution to our problem is to simply stop using it! Thank you again, Dónal On Wednesday, October 22, 2014 5:47:48 PM UTC-4, Jörg Prante wrote: > > I am not sure why you are after "limit". It is not a "size" parameter and > it does not work as you expect. There is no guarantee for 5 shards and > limit = 5 that you can always obtain 25 docs. > > For filters, Elasticsearch has added some Lucene extensions regarding the > iteration of doc sets. One extension is the "LimitFilter". Lucene uses doc > IDs for enumerating docs in the index reader contexts and the IDs are > unordered but they are non-decreasing. There can be many segments on a > shard, each segment carries such a doc ID sequence. On a shard, > Elasticsearch iterates through the matching docs of a filter when applying > a LimitFilter, and this iteration can be short-cut by setting a limit for > this iteration. The price to pay is that parts of the matched docs in the > filter may be dropped. Most users do not want that, this is a very advanced > setting. This is not "non-deterministic", it is just very low level. > > Jörg > > > On Wed, Oct 22, 2014 at 11:07 PM, Jeff Gandt <[email protected] > <javascript:>> wrote: > >> I realize "limit" is not a limit for response size. I'm actually ok with >> getting more than one result. I'm actually not relying on limit for a size. >> >> I often use size in conjunction with limit. I'll do this when I really >> don't care how many items I get back, as long as it is within a range. But >> I implement the limit to help decrease the load on the shards. >> >> That said, I need to understand what expectations I can have around >> limit. Is it completely non-deterministic? Or can I have reasonable >> expectations about it? >> >> I will propose an example and describe my expectations: >> >> Node setup: >> 1 index >> 1 mapping >> 5 shards >> 1,000,000 documents sharded across the 5 shards >> 1000 matching documents sharded across the 5 shards >> let's assume normal distribution of the matching documents: 200 documents >> per shard. I realize this is not realistic to get an exact distribution >> like this. >> >> If I place a limit of 5 on the query, I expect 25 documents back. That >> is, I get 5 documents from each node. I expect this because I have at least >> 5 matching documents per shard. In fact, I have many more than 5 matching >> documents per shard. But I expect the limit to return five documents from >> each shard. >> >> Now I realize there are lots of real world circumstance that would cause >> the query to return fewer than 25 documents. Let's ignore those for the >> time being and remain under the assumption that the distribution is even. >> >> Now, if I place a limit of 1 on the query, I expect 5 documents back. >> >> Are these two expectations correct? >> >> Now let's assume a worst case scenario: all of the matching documents are >> on one shard. A limit of 5 should still return 5 documents. A limit of 1 >> should return 1 document. >> >> If these expectations are true, then my original scenario is valid and a >> limit of 1 should still return 1 document. >> >> So are these expectations valid? Or is limit completely non-deterministic? >> >> Size does work, but if I can improve performance with a limit, I would >> like to do so. It is possible that I have tens of thousands of matching >> documents, and limit could be an excellent short-circuit. Basically I want >> the shard to stop searching as soon as it has found one document. >> >> Also, I don't have the document _id so I cannot make the HEAD call. >> >> Do these clarifications help? >> >> On Wednesday, October 22, 2014 3:57:25 PM UTC-4, Jörg Prante wrote: >>> >>> "limit" is not a limit for response size. It sets a shard limit which is >>> quite low level, so the resources per shard of ES are not so much under >>> pressure. If the sum of the limits on the shards matches the total length >>> of the response is not guaranteed. >>> >>> The limit parameter for the response is the "size" parameter. Can you try >>> >>> POST profiles/profile/_search >>> { >>> "size" : 1, >>> "query": { >>> "constant_score" : { >>> "filter" : { >>> "term": { >>> "profile_id": "salinger-23145" >>> } >>> } >>> } >>> } >>> } >>> >>> and see if this works better? >>> >>> If you want to perform a true existence check of a doc, you should use >>> the doc _id and a head request, something like >>> >>> HEAD profiles/profile/id >>> >>> which is faster than a search. >>> >>> Jörg >>> >>> >>> On Wed, Oct 22, 2014 at 8:58 PM, Jeff Gandt <[email protected]> wrote: >>> >>>> I have a query that I want to return only one document. Basically, I >>>> want to do an existence check on a document with a given term filter. >>>> >>>> I am executing: >>>> >>>> POST profiles/profile/_search >>>> { >>>> "query": { >>>> "filtered": { >>>> "filter": { >>>> "bool": { >>>> "must": [ >>>> { >>>> "limit": { >>>> "value": 1 >>>> } >>>> }, >>>> { >>>> "term": { >>>> "profile_id": "salinger-23145" >>>> } >>>> } >>>> ] >>>> } >>>> } >>>> } >>>> } >>>> } >>>> >>>> The profiles/profile mapping has tens of millions of documents in it, >>>> two of which match the given terms query (when the limit is removed >>>> entirely). >>>> >>>> When I execute the query, I get zero results back. However, If I change >>>> the limit value to two (2) then one (1) result is returned. If I change >>>> the >>>> limit value to three (3) then two (2) results are returned. It's almost >>>> like there is an off by one error in limit. >>>> >>>> So am I: >>>> >>>> 1) Writing the query wrong? >>>> I tried placing the limit outside of the must, bool, and filter >>>> clauses. That caused errors in each case. But I may have just done >>>> something silly. >>>> >>>> 2) Misunderstanding limit? >>>> My understanding of limit is that it returns no more than x documents >>>> per shard. Given that I have five shards and at least two documents >>>> matching the query, I should be returning between one and five documents. >>>> However, looking at the limit documentation >>>> <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html> >>>> I >>>> suspect that I may be misunderstanding how limit works. The wording "to >>>> execute on" leads me to believe that it may only be selecting ONE document >>>> against which the term filter is run. Thus, if the one document that it >>>> tests doesn't match, it returns zero results. However, the limit 2 >>>> returning one document leads me to believe that my original understanding >>>> is correct. >>>> >>>> 3) Staring at an elasticsearch limit bug? >>>> Unfortunately I have been unable to reproduce the error after creating >>>> test indexes and mappings. The limit behaves exactly as I expect in every >>>> other case. >>>> >>>> 4) Doing something else that is equally silly? >>>> >>>> Any help or suggestions is appreciated. Can I provide any >>>> clarifications? >>>> >>>> Thanks, >>>> >>>> .jpg >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/15814fa7-fc46-4a70-9a2d-f18123b7b1ff%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/15814fa7-fc46-4a70-9a2d-f18123b7b1ff%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4ba4f9d2-f52e-4382-9de7-c280bf00b3d9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
