Re: [jira] Commented: (SOLR-2218) Performance of start= and rows= parameters are exponentially slow with large data sets

Grant Ingersoll Sat, 08 Jan 2011 05:16:55 -0800

The weird thing is, all of our collectors, IMO, are optimized for the 
non-paging scenario, when I would venture to guess that the very large majority 
of users out there do paging.  AFAICT, about the only people who don't do 
paging are those who do deep, downstream analysis which requires them to 
retrieve 100's or 1000's or more of results at a time (I've seen as much as a 
million used in production) as part of a batch job.


See https://issues.apache.org/jira/browse/LUCENE-2215 and 
https://issues.apache.org/jira/browse/SOLR-1726 for the issues tracking this.

-Grant

On Jan 8, 2011, at 7:11 AM, Earwin Burrfoot wrote:

> On Mon, Jan 3, 2011 at 18:18, Yonik Seeley <[email protected]> wrote:
>> On Thu, Nov 11, 2010 at 3:22 PM, Jan Høydahl / 
>> Cominvent<[email protected]> wrote:
>>> The problem with large "start" is probably worse when sharding is involved. 
>>> Anyone know how the shard component goes about fetching 
>>> start=1000000&rows=10 from say 10 shards? Does it have to merge sorted 
>>> lists of 1mill+10 docsids from each shard which is the worst case?
>> 
>> Yep, that's how it works today.
>> 
> 
> Technically, if your docs have a non-biased (in regards to their
> sort-value) distribution across shards, you can fetch much less than
> topN docs from each shard.
> I played with the idea, and it worked for me. Though later I dropped
> the opto, as it complicated things somewhat and my users aren't
> querying gazillions of docs often.
> 
> 
> -- 
> Kirill Zakharenko/Кирилл Захаренко ([email protected])
> Phone: +7 (495) 683-567-4
> ICQ: 104465785
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [jira] Commented: (SOLR-2218) Performance of start= and rows= parameters are exponentially slow with large data sets

Reply via email to