[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

David Smiley (JIRA) Tue, 30 Dec 2014 14:50:33 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley updated SOLR-6581:
-------------------------------
    Attachment: renames.diff

While you are improving CollapsingQParserPlugin, I suggest doing some variable 
renames.  In my work I needed something like this code so I forked it to be 
modified, but found my first task was to do a bunch of renames so that it was 
clear what variable was for what.  The attached patch is a redacted version 
from my code and includes a tad bit of other stuff to be ignored, but see the 
change in variable names, and a getter rename or two.  As a random example, 
"docId" is unclear; is this a global doc ID or is it segment local?  Likewise 
for ordinals.  Arguably most of Lucene/Solr is guilty of this but this one 
source file I found hard to penetrate until I did the renames to decipher 
what's going on.

> Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
> -----------------------------------------------------------
>
>                 Key: SOLR-6581
>                 URL: https://issues.apache.org/jira/browse/SOLR-6581
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
> SOLR-6581.patch, renames.diff
>
>
> *Background*
> The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
> are optimized to work with a top level FieldCache. Top level FieldCaches have 
> a very fast docID to top-level ordinal lookup. Fast access to the top-level 
> ordinals allows for very high performance field collapsing on high 
> cardinality fields. 
> LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
> FieldCache is no longer in regular use. Instead all top level caches are 
> accessed through MultiDocValues. 
> There are some major advantages of using the MultiDocValues rather then a top 
> level FieldCache. But there is one disadvantage, the lookup from docId to 
> top-level ordinals is slower using MultiDocValues.
> My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
> to use MultiDocValues, the performance drop is around 100%.  For some use 
> cases this performance drop is a blocker.
> *What About Faceting?*
> String faceting also relies on the top level ordinals. Is faceting 
> performance affected also? My testing has shown that the faceting performance 
> is affected much less then collapsing. 
> One possible reason for this may be that field collapsing is memory bound and 
> faceting is not. So the additional memory accesses needed for MultiDocValues 
> affects field collapsing much more then faceting.
> *Proposed Solution*
> The proposed solution is to have the default Collapse and Expand algorithm 
> use MultiDocValues, but to provide an option to use a top level FieldCache if 
> the performance of MultiDocValues is a blocker.
> The proposed mechanism for switching to the FieldCache would be a new "hint" 
> parameter. If the hint parameter is set to "FAST_QUERY" then the top-level 
> FieldCache would be used for both Collapse and Expand.
> Example syntax:
> {code}
> fq={!collapse field=x hint=FAST_QUERY}
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6581) Prepare CollapsingQParserPlugin and ExpandComponent for 5.0

Reply via email to