I wanted to add a note as to the motivation for these changes. Essentially
we should be able to scale-up better with Solr/Lucene and not just
scale-out by sharding. With sharding one enters distributed systems
territory with all the pitfalls and failure conditions, which is not ideal
especially if your index is not large enough to warrant it. The next
frontier seems to be to utilize multiple CPU cores for performing a search
request's legwork, and the results are very promising - 2-3x speedups in
many cases!

Would be great to get a sense of committer interest :) Let me know if I
should just open a JIRA with patches.


On Sat, Oct 19, 2013 at 6:18 PM, Shikhar Bhushan <sbhus...@etsy.com> wrote:

> I got inspired by Uwe Schindler's talk "Is your index reader really
> atomic or maybe 
> slow?"<http://www.lucenerevolution.org/sites/default/files/Schindler%20-%20IsYourIndexReaderReallyAtomicOrMaybeSlow_0.pdf>to
>  look into the state of parallelism when searching across segments. It
> seems there has been some effort to do this but we never went all the way.
>
> IndexSearcher has an optional constructor arg for an ExecutorService,
> which gets used for searching in parallel for call paths where one of the
> TopDocCollector's is created internally. The per-segment search happens in
> parallel and then the TopDocs/TopFieldDocs results are merged with locking
> around the merge bit.
>
> What's the upside when performing parallel search? I benchmarked using
> luceneutil on WIKI_MEDIUM_10M with trunk vs trunk with IndexSearcher
> constructor hacked to initialize a 
> ForkJoinPool<https://gist.github.com/anonymous/7048089>
> .
>
> Report after iter 19:
>                     TaskQPS baseline      StdDev QPS patched      StdDev
>              Pct diff
>                  Respell       63.05      (3.0%)       46.37      (2.9%)
>  -26.5% ( -31% -  -21%)
>                 PKLookup      255.80      (2.6%)      194.11      (7.4%)
>  -24.1% ( -33% -  -14%)
>                   Fuzzy2       54.66      (3.8%)       42.28      (3.5%)
>  -22.7% ( -28% -  -16%)
>                   Fuzzy1       75.35      (2.9%)       59.23      (4.2%)
>  -21.4% ( -27% -  -14%)
>          MedSloppyPhrase      141.74      (3.6%)      124.56      (9.7%)
>  -12.1% ( -24% -    1%)
>               AndHighLow      922.70      (3.1%)      908.18     (10.0%)
> -1.6% ( -14% -   11%)
>          LowSloppyPhrase       88.50      (4.2%)      106.64     (13.0%)
> 20.5% (   3% -   39%)
>                  LowTerm      694.97      (2.7%)      949.30     (21.6%)
> 36.6% (  11% -   62%)
>              MedSpanNear      102.47      (3.0%)      160.94     (12.7%)
> 57.1% (  40% -   75%)
>             OrNotHighLow       98.58      (4.9%)      155.22     (18.1%)
> 57.5% (  32% -   84%)
>               AndHighMed      260.31      (1.7%)      448.19     (23.5%)
> 72.2% (  46% -   99%)
>                OrHighLow      101.68      (6.7%)      179.91     (27.8%)
> 76.9% (  39% -  119%)
>                  MedTerm      200.49      (3.1%)      365.70     (32.8%)
> 82.4% (  45% -  122%)
>                 HighTerm      145.36      (3.3%)      268.51     (27.7%)
> 84.7% (  51% -  119%)
>                  Prefix3       81.59      (2.8%)      153.52     (24.8%)
> 88.2% (  58% -  119%)
>             OrHighNotLow       50.67      (6.9%)       95.91     (22.5%)
> 89.3% (  56% -  127%)
>                OrHighMed       93.37      (6.3%)      182.37     (24.3%)
> 95.3% (  60% -  134%)
>             OrHighNotMed       78.19      (6.7%)      153.30     (30.7%)
> 96.1% (  54% -  143%)
>               OrHighHigh       46.59      (7.0%)       92.33     (31.7%)
> 98.2% (  55% -  147%)
>            OrHighNotHigh       51.01      (5.7%)      105.32     (25.1%)
>  106.5% (  71% -  145%)
>                 Wildcard       78.53      (3.7%)      168.42     (40.8%)
>  114.5% (  67% -  164%)
>            OrNotHighHigh       40.42      (5.5%)       93.45     (29.5%)
>  131.2% (  91% -  175%)
>             OrNotHighMed       24.40      (4.8%)       57.00     (22.0%)
>  133.6% ( 102% -  168%)
>             HighSpanNear        4.13      (3.6%)       10.09     (10.5%)
>  144.3% ( 125% -  164%)
>              LowSpanNear       14.27      (2.3%)       35.24     (17.0%)
>  147.0% ( 124% -  170%)
>                   IntNRQ        8.73      (4.8%)       21.69     (23.6%)
>  148.6% ( 114% -  185%)
>                LowPhrase       29.11      (2.9%)       72.87     (22.4%)
>  150.3% ( 121% -  180%)
>                MedPhrase       33.19      (3.4%)       83.38     (40.4%)
>  151.3% ( 103% -  201%)
>         HighSloppyPhrase        6.46      (5.8%)       16.37     (21.9%)
>  153.5% ( 119% -  192%)
>               HighPhrase       10.00      (6.8%)       25.66     (14.4%)
>  156.7% ( 126% -  190%)
>              AndHighHigh       44.82      (1.1%)      115.77     (39.6%)
>  158.3% ( 116% -  201%)
>
> That looks pretty awesome! Any feedback from existing users?
> *
> *
> *Problems*
>
>    - (relatively minor) Solr can't take advantage of this yet.
>    SolrIndexSearcher's disregard the possibility of propagating an
>    ExecutorService up to IndexSearcher constructor.
>    - Even if it did, as far as I can tell Solr does not make use of
>    IndexSearcher's search() variants that create TopDocCollector's internally
>    and parallelize in the presence of the executor.
>    - If arbitary Collector args come into play, we don't/can't
>    parallelize. Note that even if ultimately results are going to a
>    TopDocCollector it may be wrapped inside e.g. a EarlyTerminatingCollector
>    or TimeLimitingCollector or both.
>    - The special-casing with parallelism layered on top as IndexSearcher
>    does with TopDocCollector's does not scale, there are lots* *of
>    Collector's that could be potentially lend themselves to parallelism.
>
> *Idea*
> *
> *
>
>
> I have been exploring a refactoring of Collector's that allows for 
> parallelization at the level of the collection protocol. Some of the design 
> decisions:
>
>
>    - easy migration path for collectors that want to remain serial
>
>    - the parallelization should be composable (when Collector's wrap other 
> Collector's)
>    - allow collector's to pick the optimal solution (e.g. there might be 
> memory tradeoffs to be made) by advising the collector about whether a search 
> will be parallelized
>
>    - encourage use of lock-free parallelism by providing for a segment-level 
> done() method that is guaranteed to execute in a single-threaded manner (so 
> if you were to accumulate some state at the segment-level, it can be safely 
> merged with the composite state without use of locking)
>
> The code is currently in a fork on github 
> <https://github.com/shikhar/lucene-solr/compare/apache:trunk...trunk>.
>
>
>    - Collector goes from being an abstract class to this interface 
> <https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/Collector.java>.
>  A new interface SubCollector 
> <https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/SubCollector.java>
>  is introduced.
>    - The existing subclasses of Collector can choose to either implement the 
> new interface, or subclass a SerialCollector 
> <https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/SerialCollector.java>
>  abstract class which maintains essentially the same API as before.
>    - There is javadoc on the new interfaces, but the new collection protocol 
> is probably clearer in connection with the usage from IndexSearcher 
> <https://github.com/shikhar/lucene-solr/blob/b3098fc/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L571-L636>.
>    - I tried to convert at least all of the collectors that wrap other 
> collectors over to the new interface, rather than simply extending 
> SerialCollector (also introduced a new WrappingCollector 
> <https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/WrappingCollector.java>
>  base class), so that the wrappers themselves aren't blockers from a search 
> being parallelizable. There are probably a few cases I have missed.
>    - An example of a collector that is very easily parallelizable in the 
> divide-and-conquer style: TotalHitCountCollector 
> <https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollector.java>
>
>    - An update of TopScoreDocCollector to being a parallelizable Collector 
> <https://github.com/shikhar/lucene-solr/commit/b3098fcfdeb302481e93aab93eb77d0cabc72f9b?w=1>,
>  and corresponding removal of the special-cased parallelism inside 
> IndexSearcher.
>
> Benchmark results <https://gist.github.com/shikhar/7062026>
>
> *Next steps*
> *
> *
> I would love to get your feedback on the idea and implementation, and how
> this could be incorporated into trunk. Code review would be much
> appreciated.
>
> Some follow-up work that needs to be done (help needed!):
>
>    - Making TopFieldCollector a parallelizable Collector, like was done
>    for TopScoreDocCollector, and removal of the special-casing inside
>    IndexSearcher.
>    - Solr support -- perhaps making the Executor to be used by
>    SolrIndexSearcher configurable from solrconfig.xml.
>    - Making more collector's parallelizable - lots of opportunities here.
>    I guess this will be an ongoing process.
>
> Looking forward to your thoughts!
>
> Thanks,
>
> Shikhar Bhushan
>

Reply via email to