[ 
https://issues.apache.org/jira/browse/LUCENE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185751#comment-14185751
 ] 

Shikhar Bhushan commented on LUCENE-5299:
-----------------------------------------

Just an update that the code rebased against recent trunk lives at 
https://github.com/shikhar/lucene-solr/tree/LUCENE-5299. I've made various 
tweaks, like being able to throttle per-request parallelism in 
{{ParallelSearchStrategy}}.

luceneutil bench numbers when running with ^
  + hacked IndexSearcher constructor that uses {{ParallelSearchStrategy(new 
ForkJoinPool(128), 8)}}
  + luceneutil constants.py SEARCH_NUM_THREADS = 16

Against trunk, on a 32 core (with HT) Sandy Bridge server, with source 
{{wikimedium500k}}

{noformat}
Report after iter 19:
                    TaskQPS baseline      StdDev  QPS parcol      StdDev        
        Pct diff
                  Fuzzy1       81.91     (43.2%)       52.96     (39.7%)  
-35.3% ( -82% -   83%)
                 LowTerm     2550.11     (11.9%)     1927.28      (5.6%)  
-24.4% ( -37% -   -7%)
                 Respell       43.02     (39.4%)       35.23     (31.5%)  
-18.1% ( -63% -   87%)
                  Fuzzy2       19.32     (25.1%)       16.40     (34.8%)  
-15.1% ( -59% -   59%)
                 MedTerm     1679.37     (12.2%)     1743.27      (8.6%)    
3.8% ( -15% -   28%)
                PKLookup      221.58      (8.3%)      257.36     (13.2%)   
16.1% (  -4% -   41%)
              AndHighLow     1027.99     (11.6%)     1278.39     (15.9%)   
24.4% (  -2% -   58%)
              AndHighMed      741.50     (10.0%)     1198.04     (27.5%)   
61.6% (  21% -  110%)
               MedPhrase      709.04     (11.6%)     1203.02     (24.3%)   
69.7% (  30% -  119%)
             LowSpanNear      601.13     (16.9%)     1127.30     (16.7%)   
87.5% (  46% -  145%)
         LowSloppyPhrase      554.87     (10.8%)     1130.25     (30.5%)  
103.7% (  56% -  162%)
               OrHighMed      408.55     (10.4%)      977.56     (20.1%)  
139.3% (  98% -  189%)
               LowPhrase      364.36     (10.8%)      893.27     (41.0%)  
145.2% (  84% -  220%)
               OrHighLow      355.78     (12.7%)      893.63     (19.6%)  
151.2% ( 105% -  210%)
             AndHighHigh      390.73     (10.3%)     1004.70     (24.3%)  
157.1% ( 111% -  213%)
                HighTerm      399.01     (11.8%)     1067.67     (12.1%)  
167.6% ( 128% -  217%)
                Wildcard      754.76     (11.6%)     2067.96     (28.0%)  
174.0% ( 120% -  241%)
            HighSpanNear      153.57     (14.8%)      463.54     (24.3%)  
201.8% ( 141% -  282%)
              OrHighHigh      212.16     (12.4%)      665.56     (28.2%)  
213.7% ( 154% -  290%)
              HighPhrase      170.49     (13.1%)      547.72     (17.3%)  
221.3% ( 168% -  289%)
        HighSloppyPhrase       66.91     (10.1%)      219.59     (12.0%)  
228.2% ( 187% -  278%)
         MedSloppyPhrase      128.73     (12.5%)      425.67     (20.3%)  
230.7% ( 175% -  300%)
             MedSpanNear      130.31     (10.7%)      436.12     (18.2%)  
234.7% ( 185% -  295%)
                 Prefix3      166.91     (14.9%)      652.64     (26.7%)  
291.0% ( 217% -  390%)
                  IntNRQ      110.73     (15.0%)      467.72     (33.6%)  
322.4% ( 238% -  436%)
{noformat}


> Refactor Collector API for parallelism
> --------------------------------------
>
>                 Key: LUCENE-5299
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5299
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Shikhar Bhushan
>         Attachments: LUCENE-5299.patch, LUCENE-5299.patch, LUCENE-5299.patch, 
> LUCENE-5299.patch, LUCENE-5299.patch, benchmarks.txt
>
>
> h2. Motivation
> We should be able to scale-up better with Solr/Lucene by utilizing multiple 
> CPU cores, and not have to resort to scaling-out by sharding (with all the 
> associated distributed system pitfalls) when the index size does not warrant 
> it.
> Presently, IndexSearcher has an optional constructor arg for an 
> ExecutorService, which gets used for searching in parallel for call paths 
> where one of the TopDocCollector's is created internally. The 
> per-atomic-reader search happens in parallel and then the 
> TopDocs/TopFieldDocs results are merged with locking around the merge bit.
> However there are some problems with this approach:
> * If arbitary Collector args come into play, we can't parallelize. Note that 
> even if ultimately results are going to a TopDocCollector it may be wrapped 
> inside e.g. a EarlyTerminatingCollector or TimeLimitingCollector or both.
> * The special-casing with parallelism baked on top does not scale, there are 
> many Collector's that could potentially lend themselves to parallelism, and 
> special-casing means the parallelization has to be re-implemented if a 
> different permutation of collectors is to be used.
> h2. Proposal
> A refactoring of collectors that allows for parallelization at the level of 
> the collection protocol. 
> Some requirements that should guide the implementation:
> * easy migration path for collectors that need to remain serial
> * the parallelization should be composable (when collectors wrap other 
> collectors)
> * allow collectors to pick the optimal solution (e.g. there might be memory 
> tradeoffs to be made) by advising the collector about whether a search will 
> be parallelized, so that the serial use-case is not penalized.
> * encourage use of non-blocking constructs and lock-free parallelism, 
> blocking is not advisable for the hot-spot of a search, besides wasting 
> pooled threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to