[jira] [Updated] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Chris M. Hostetter (Jira) Wed, 08 Apr 2020 10:28:22 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris M. Hostetter updated SOLR-13132:
--------------------------------------
    Attachment: SOLR-13132_testSweep.patch
        Status: Open  (was: Open)


bq. ... In fact, otherAccs and resort, being likely to generate more DocSet 
lookups than refinement, make it all the more important that SKGSlotAcc respect 
cacheDf to control filterCache usage, no?

off the top of my head, I'm not certain that they will involved _more_ lookups 
then refinement, but it certainly seems like if it's useful for refinement, it 
would also be useful for those cases as well.

bq. ... I plan to work through them in the next day or two and address any 
questions as they come up.

Sweet.

I went ahead and started working on an "equivilence testing" patch to try and 
help definitively prove that using {{swep: true}} or {{sweep: false}} produce 
the same results on otherwise equivilent (randomly generted) facet requests.  
I'm attaching that as {{SOLR-13132_testSweep.patch}}.  The big missing piece 
here is a stubbed out "whitebox" test (see nocommits) to use the debug output 
to "prove" that sweep collection is actualy being used when/if expected based 
on the {{sweep}} param (and effective processor).

* As is on master this test passes (because nothing looks for a {{sweep}} param 
so it's just comparing queries with themselves).
* When modifying this patch to use {{disable_sweep_collection}} it passed 
reliably from what i could tell.

...once your major changes to the impl are done, we'll probably wnat more 
changes to this test to help tickle "edge code paths" (once we have a better 
handle on what they are .. for instance: right now only one sweep using 
{{relatedness()}} function per facet, but i'm pretty sure testing multiple 
sweep aggs in a single query, and mixing in some non sweep functions, will be 
important for code coverage.



> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Reply via email to