I realize "limit" is not a limit for response size. I'm actually ok with 
getting more than one result. I'm actually not relying on limit for a size.

I often use size in conjunction with limit. I'll do this when I really 
don't care how many items I get back, as long as it is within a range. But 
I implement the limit to help decrease the load on the shards.

That said, I need to understand what expectations I can have around limit. 
Is it completely non-deterministic? Or can I have reasonable expectations 
about it?

I will propose an example and describe my expectations:

Node setup:
1 index
1 mapping
5 shards
1,000,000 documents sharded across the 5 shards
1000 matching documents sharded across the 5 shards
let's assume normal distribution of the matching documents: 200 documents 
per shard. I realize this is not realistic to get an exact distribution 
like this.

If I place a limit of 5 on the query, I expect 25 documents back. That is, 
I get 5 documents from each node. I expect this because I have at least 5 
matching documents per shard. In fact, I have many more than 5 matching 
documents per shard. But I expect the limit to return five documents from 
each shard.

Now I realize there are lots of real world circumstance that would cause 
the query to return fewer than 25 documents. Let's ignore those for the 
time being and remain under the assumption that the distribution is even.

Now, if I place a limit of 1 on the query, I expect 5 documents back.

Are these two expectations correct?

Now let's assume a worst case scenario: all of the matching documents are 
on one shard. A limit of 5 should still return 5 documents. A limit of 1 
should return 1 document.

If these expectations are true, then my original scenario is valid and a 
limit of 1 should still return 1 document.

So are these expectations valid? Or is limit completely non-deterministic?

Size does work, but if I can improve performance with a limit, I would like 
to do so. It is possible that I have tens of thousands of matching 
documents, and limit could be an excellent short-circuit. Basically I want 
the shard to stop searching as soon as it has found one document.

Also, I don't have the document _id so I cannot make the HEAD call.

Do these clarifications help?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7dd91dd3-bec2-48d5-97b6-334fe10e3cb1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to