Joel Bernstein created SOLR-13494:
-------------------------------------

             Summary: Improve the performance of random sampling
                 Key: SOLR-13494
                 URL: https://issues.apache.org/jira/browse/SOLR-13494
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: streaming expressions
            Reporter: Joel Bernstein


Currently the *random* Streaming Expression performs a conventional distributed 
search. This involves retrieving the top N docs from each shard and then 
selecting the top N from all the shards in the aggregator node. This technique 
eventually bogs down as the number of shards goes up and/or N goes up. 

Selecting distributed random samples does not actually require this behavior. 
Instead you can select N/numShards from each shard and simply return all 
results. This technique will actually get faster as more shards are added 
instead of slowing down.

This ticket will allow the random Streaming Expression to use the strategy 
above when N reaches a certain threshold (ie 10000).

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to