Refactoring Collector interface for parallelism

Shikhar Bhushan Sat, 19 Oct 2013 15:19:27 -0700

I got inspired by Uwe Schindler's talk "Is your index reader really atomic
or maybe 
slow?"<http://www.lucenerevolution.org/sites/default/files/Schindler%20-%20IsYourIndexReaderReallyAtomicOrMaybeSlow_0.pdf>to
look into the state of parallelism when searching across segments. It
seems there has been some effort to do this but we never went all the way.


IndexSearcher has an optional constructor arg for an ExecutorService, which
gets used for searching in parallel for call paths where one of the
TopDocCollector's is created internally. The per-segment search happens in
parallel and then the TopDocs/TopFieldDocs results are merged with locking
around the merge bit.

What's the upside when performing parallel search? I benchmarked using
luceneutil on WIKI_MEDIUM_10M with trunk vs trunk with IndexSearcher
constructor hacked to initialize a
ForkJoinPool<https://gist.github.com/anonymous/7048089>
.

Report after iter 19:
                    TaskQPS baseline      StdDev QPS patched      StdDev
             Pct diff
                 Respell       63.05      (3.0%)       46.37      (2.9%)
 -26.5% ( -31% -  -21%)
                PKLookup      255.80      (2.6%)      194.11      (7.4%)
 -24.1% ( -33% -  -14%)
                  Fuzzy2       54.66      (3.8%)       42.28      (3.5%)
 -22.7% ( -28% -  -16%)
                  Fuzzy1       75.35      (2.9%)       59.23      (4.2%)
 -21.4% ( -27% -  -14%)
         MedSloppyPhrase      141.74      (3.6%)      124.56      (9.7%)
 -12.1% ( -24% -    1%)
              AndHighLow      922.70      (3.1%)      908.18     (10.0%)
-1.6% ( -14% -   11%)
         LowSloppyPhrase       88.50      (4.2%)      106.64     (13.0%)
20.5% (   3% -   39%)
                 LowTerm      694.97      (2.7%)      949.30     (21.6%)
36.6% (  11% -   62%)
             MedSpanNear      102.47      (3.0%)      160.94     (12.7%)
57.1% (  40% -   75%)
            OrNotHighLow       98.58      (4.9%)      155.22     (18.1%)
57.5% (  32% -   84%)
              AndHighMed      260.31      (1.7%)      448.19     (23.5%)
72.2% (  46% -   99%)
               OrHighLow      101.68      (6.7%)      179.91     (27.8%)
76.9% (  39% -  119%)
                 MedTerm      200.49      (3.1%)      365.70     (32.8%)
82.4% (  45% -  122%)
                HighTerm      145.36      (3.3%)      268.51     (27.7%)
84.7% (  51% -  119%)
                 Prefix3       81.59      (2.8%)      153.52     (24.8%)
88.2% (  58% -  119%)
            OrHighNotLow       50.67      (6.9%)       95.91     (22.5%)
89.3% (  56% -  127%)
               OrHighMed       93.37      (6.3%)      182.37     (24.3%)
95.3% (  60% -  134%)
            OrHighNotMed       78.19      (6.7%)      153.30     (30.7%)
96.1% (  54% -  143%)
              OrHighHigh       46.59      (7.0%)       92.33     (31.7%)
98.2% (  55% -  147%)
           OrHighNotHigh       51.01      (5.7%)      105.32     (25.1%)
 106.5% (  71% -  145%)
                Wildcard       78.53      (3.7%)      168.42     (40.8%)
 114.5% (  67% -  164%)
           OrNotHighHigh       40.42      (5.5%)       93.45     (29.5%)
 131.2% (  91% -  175%)
            OrNotHighMed       24.40      (4.8%)       57.00     (22.0%)
 133.6% ( 102% -  168%)
            HighSpanNear        4.13      (3.6%)       10.09     (10.5%)
 144.3% ( 125% -  164%)
             LowSpanNear       14.27      (2.3%)       35.24     (17.0%)
 147.0% ( 124% -  170%)
                  IntNRQ        8.73      (4.8%)       21.69     (23.6%)
 148.6% ( 114% -  185%)
               LowPhrase       29.11      (2.9%)       72.87     (22.4%)
 150.3% ( 121% -  180%)
               MedPhrase       33.19      (3.4%)       83.38     (40.4%)
 151.3% ( 103% -  201%)
        HighSloppyPhrase        6.46      (5.8%)       16.37     (21.9%)
 153.5% ( 119% -  192%)
              HighPhrase       10.00      (6.8%)       25.66     (14.4%)
 156.7% ( 126% -  190%)
             AndHighHigh       44.82      (1.1%)      115.77     (39.6%)
 158.3% ( 116% -  201%)

That looks pretty awesome! Any feedback from existing users?
*
*
*Problems*

   - (relatively minor) Solr can't take advantage of this yet.
   SolrIndexSearcher's disregard the possibility of propagating an
   ExecutorService up to IndexSearcher constructor.
   - Even if it did, as far as I can tell Solr does not make use of
   IndexSearcher's search() variants that create TopDocCollector's internally
   and parallelize in the presence of the executor.
   - If arbitary Collector args come into play, we don't/can't parallelize.
   Note that even if ultimately results are going to a TopDocCollector it may
   be wrapped inside e.g. a EarlyTerminatingCollector or TimeLimitingCollector
   or both.
   - The special-casing with parallelism layered on top as IndexSearcher
   does with TopDocCollector's does not scale, there are lots* *of
   Collector's that could be potentially lend themselves to parallelism.

*Idea*
*
*

I have been exploring a refactoring of Collector's that allows for
parallelization at the level of the collection protocol. Some of the
design decisions:

   - easy migration path for collectors that want to remain serial
   - the parallelization should be composable (when Collector's wrap
other Collector's)
   - allow collector's to pick the optimal solution (e.g. there might
be memory tradeoffs to be made) by advising the collector about
whether a search will be parallelized
   - encourage use of lock-free parallelism by providing for a
segment-level done() method that is guaranteed to execute in a
single-threaded manner (so if you were to accumulate some state at the
segment-level, it can be safely merged with the composite state
without use of locking)

The code is currently in a fork on github
<https://github.com/shikhar/lucene-solr/compare/apache:trunk...trunk>.

   - Collector goes from being an abstract class to this interface
<https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/Collector.java>.
A new interface SubCollector
<https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/SubCollector.java>
is introduced.
   - The existing subclasses of Collector can choose to either
implement the new interface, or subclass a SerialCollector
<https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/SerialCollector.java>
abstract class which maintains essentially the same API as before.
   - There is javadoc on the new interfaces, but the new collection
protocol is probably clearer in connection with the usage from
IndexSearcher 
<https://github.com/shikhar/lucene-solr/blob/b3098fc/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L571-L636>.
   - I tried to convert at least all of the collectors that wrap other
collectors over to the new interface, rather than simply extending
SerialCollector (also introduced a new WrappingCollector
<https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/WrappingCollector.java>
base class), so that the wrappers themselves aren't blockers from a
search being parallelizable. There are probably a few cases I have
missed.
   - An example of a collector that is very easily parallelizable in
the divide-and-conquer style: TotalHitCountCollector
<https://github.com/shikhar/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollector.java>
   - An update of TopScoreDocCollector to being a parallelizable
Collector 
<https://github.com/shikhar/lucene-solr/commit/b3098fcfdeb302481e93aab93eb77d0cabc72f9b?w=1>,
and corresponding removal of the special-cased parallelism inside
IndexSearcher.

Benchmark results <https://gist.github.com/shikhar/7062026>

*Next steps*
*
*
I would love to get your feedback on the idea and implementation, and how
this could be incorporated into trunk. Code review would be much
appreciated.

Some follow-up work that needs to be done (help needed!):

   - Making TopFieldCollector a parallelizable Collector, like was done for
   TopScoreDocCollector, and removal of the special-casing inside
   IndexSearcher.
   - Solr support -- perhaps making the Executor to be used by
   SolrIndexSearcher configurable from solrconfig.xml.
   - Making more collector's parallelizable - lots of opportunities here. I
   guess this will be an ongoing process.

Looking forward to your thoughts!

Thanks,

Shikhar Bhushan

Refactoring Collector interface for parallelism

Reply via email to