[ https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Smiley updated SOLR-6581: ------------------------------- Attachment: renames.diff While you are improving CollapsingQParserPlugin, I suggest doing some variable renames. In my work I needed something like this code so I forked it to be modified, but found my first task was to do a bunch of renames so that it was clear what variable was for what. The attached patch is a redacted version from my code and includes a tad bit of other stuff to be ignored, but see the change in variable names, and a getter rename or two. As a random example, "docId" is unclear; is this a global doc ID or is it segment local? Likewise for ordinals. Arguably most of Lucene/Solr is guilty of this but this one source file I found hard to penetrate until I did the renames to decipher what's going on. > Prepare CollapsingQParserPlugin and ExpandComponent for 5.0 > ----------------------------------------------------------- > > Key: SOLR-6581 > URL: https://issues.apache.org/jira/browse/SOLR-6581 > Project: Solr > Issue Type: Bug > Reporter: Joel Bernstein > Assignee: Joel Bernstein > Priority: Minor > Fix For: 5.0 > > Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, > SOLR-6581.patch, renames.diff > > > *Background* > The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent > are optimized to work with a top level FieldCache. Top level FieldCaches have > a very fast docID to top-level ordinal lookup. Fast access to the top-level > ordinals allows for very high performance field collapsing on high > cardinality fields. > LUCENE-5666 unified the DocValues and FieldCache api's so that the top level > FieldCache is no longer in regular use. Instead all top level caches are > accessed through MultiDocValues. > There are some major advantages of using the MultiDocValues rather then a top > level FieldCache. But there is one disadvantage, the lookup from docId to > top-level ordinals is slower using MultiDocValues. > My testing has shown that *after optimizing* the CollapsingQParserPlugin code > to use MultiDocValues, the performance drop is around 100%. For some use > cases this performance drop is a blocker. > *What About Faceting?* > String faceting also relies on the top level ordinals. Is faceting > performance affected also? My testing has shown that the faceting performance > is affected much less then collapsing. > One possible reason for this may be that field collapsing is memory bound and > faceting is not. So the additional memory accesses needed for MultiDocValues > affects field collapsing much more then faceting. > *Proposed Solution* > The proposed solution is to have the default Collapse and Expand algorithm > use MultiDocValues, but to provide an option to use a top level FieldCache if > the performance of MultiDocValues is a blocker. > The proposed mechanism for switching to the FieldCache would be a new "hint" > parameter. If the hint parameter is set to "FAST_QUERY" then the top-level > FieldCache would be used for both Collapse and Expand. > Example syntax: > {code} > fq={!collapse field=x hint=FAST_QUERY} > {code} > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org