[jira] [Created] (LUCENE-5299) Refactor Collector API for parallelism

Shikhar Bhushan (JIRA) Mon, 21 Oct 2013 09:00:26 -0700

Shikhar Bhushan created LUCENE-5299:
---------------------------------------


             Summary: Refactor Collector API for parallelism
                 Key: LUCENE-5299
                 URL: https://issues.apache.org/jira/browse/LUCENE-5299
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Shikhar Bhushan


h2. Motivation

We should be able to scale-up better with Solr/Lucene by utilizing multiple CPU 
cores, and not have to resort to scaling-out by sharding (with all the 
associated distributed system pitfalls) when the index size does not warrant it.

Presently, IndexSearcher has an optional constructor arg for an 
ExecutorService, which gets used for searching in parallel for call paths where 
one of the TopDocCollector's is created internally. The per-atomic-reader 
search happens in parallel and then the TopDocs/TopFieldDocs results are merged 
with locking around the merge bit.

However there are some problems with this approach:

* If arbitary Collector args come into play, we can't parallelize. Note that 
even if ultimately results are going to a TopDocCollector it may be wrapped 
inside e.g. a EarlyTerminatingCollector or TimeLimitingCollector or both.
* The special-casing with parallelism baked on top does not scale, there are 
many Collector's that could potentially lend themselves to parallelism, and 
special-casing means the parallelization has to be re-implemented if a 
different permutation of collectors is to be used.

h2. Proposal

A refactoring of collectors that allows for parallelization at the level of the 
collection protocol. 

Some requirements that should guide the implementation:

* easy migration path for collectors that need to remain serial
* the parallelization should be composable (when collectors wrap other 
collectors)
* allow collectors to pick the optimal solution (e.g. there might be memory 
tradeoffs to be made) by advising the collector about whether a search will be 
parallelized, so that the serial use-case is not penalized.
* encourage use of non-blocking constructs and lock-free parallelism, blocking 
is not advisable for the hot-spot of a search, besides wasting pooled threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-5299) Refactor Collector API for parallelism

Reply via email to