Michael Gibney created SOLR-13132:
-------------------------------------
Summary: Improve JSON "terms" facet performance when sorted by
relatedness
Key: SOLR-13132
URL: https://issues.apache.org/jira/browse/SOLR-13132
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Components: Facet Module
Affects Versions: 7.4, master (9.0)
Reporter: Michael Gibney
When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate
{{relatedness}} for every term.
The current implementation uses a standard uninverted approach (either
{{docValues}} or {{UnInvertedField}}) to get facet counts over the domain base
docSet, and then uses that initial pass as a pre-filter for a second-pass,
inverted approach of fetching docSets for each relevant term (i.e., {{count >
minCount}}?) and calculating intersection size of those sets with the domain
base docSet.
Over high-cardinality fields, the overhead of per-term docSet creation and set
intersection operations increases request latency to the point where
relatedness sort may not be usable in practice (for my use case, even after
applying the patch for SOLR-13108, for a field with ~220k unique terms per
core, QTime for high-cardinality domain docSets were, e.g.: cardinality
1816684=9000ms, cardinality 5032902=18000ms).
The attached patch brings the above example QTimes down to a manageable ~300ms
and ~250ms respectively. The approach calculates uninverted facet counts over
domain base, foreground, and background docSets in parallel in a single pass.
This allows us to take advantage of the efficiencies built into the standard
uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids the per-term
docSet creation and set intersection overhead.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]