[jira] [Created] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Michael Gibney (JIRA) Thu, 10 Jan 2019 12:04:26 -0800

Michael Gibney created SOLR-13132:
-------------------------------------

             Summary: Improve JSON "terms" facet performance when sorted by 
relatedness 
                 Key: SOLR-13132
                 URL: https://issues.apache.org/jira/browse/SOLR-13132
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: Facet Module
    Affects Versions: 7.4, master (9.0)
            Reporter: Michael Gibney



When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
{{relatedness}} for every term. 

The current implementation uses a standard uninverted approach (either 
{{docValues}} or {{UnInvertedField}}) to get facet counts over the domain base 
docSet, and then uses that initial pass as a pre-filter for a second-pass, 
inverted approach of fetching docSets for each relevant term (i.e., {{count > 
minCount}}?) and calculating intersection size of those sets with the domain 
base docSet.

Over high-cardinality fields, the overhead of per-term docSet creation and set 
intersection operations increases request latency to the point where 
relatedness sort may not be usable in practice (for my use case, even after 
applying the patch for SOLR-13108, for a field with ~220k unique terms per 
core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
1816684=9000ms, cardinality 5032902=18000ms).

The attached patch brings the above example QTimes down to a manageable ~300ms 
and ~250ms respectively. The approach calculates uninverted facet counts over 
domain base, foreground, and background docSets in parallel in a single pass. 
This allows us to take advantage of the efficiencies built into the standard 
uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids the per-term 
docSet creation and set intersection overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

Reply via email to