[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134436#comment-17134436 ]
Michael Gibney commented on SOLR-13132: --------------------------------------- Ah, no worries! Though in light of this discussion I re-thought the "Shim" and just pushed a commit (e7bb94922fbf10e501a006e672cec483a0f21b56) that: # removes the Shim class entirely (it was only ever a convenience that saved a handful of lines of code in non-hot parts of the {{collect}} methods -- but it came at the expense of even _more_ lines of a "Shim" class overriding all methods to ensure that it's not capable of doing any of the things its parent class does ... not nice, to be sure). # preserves the ability (e.g., of subclasses) to set {{countAccs}} that don't support sweep collection # allows {{DEV_NULL_COUNT_ACC}} to remain a direct subclass of {{CountSlotAcc}}, allowing it to be used without inadvertently implying that sweeping should be supported. I think this most recent iteration (commit e7bb94922fbf10e501a006e672cec483a0f21b56) would be my preference, as long as it addresses your concerns. But if you _prefer_ to require that custom {{countAcc}} must support sweeping, that would be fine too -- I wasn't able to think of an actual use case or other argument for _specifically_ supporting non-sweep {{countAcc}} in {{FacetFieldProcessorByArray*}} ... other than "don't remove support for something without a compelling reason", and "make {{DEV_NULL_COUNT_ACC}} potentially useful in unanticipated contexts, without forcing it to imply sweep collection support". It sounds like we're on the same page wrt the "which slot is allBuckets when sweeping?" question (and its possible solutions); and I agree it makes sense to punt on settling that question, for the moment. Accordingly, I'll to take a pass at addressing some of the other outstanding nocommits ... > Improve JSON "terms" facet performance when sorted by relatedness > ------------------------------------------------------------------ > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module > Affects Versions: 7.4, master (9.0) > Reporter: Michael Gibney > Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org