[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090931#comment-17090931 ]
Michael Gibney commented on SOLR-13132: --------------------------------------- Thanks, Hoss! Some initial responses re: some of the nocommit comments from [8fcd6271b6|https://github.com/magibney/lucene-solr/commit/8fcd6271b684da589ddae8e4319b564249ee76cb]: {code:java} // nocommit: for that matter: can we eliminate SweepingAcc as a class, // nocommit: and just roll that specific logic into CountSlotAcc? // nocommit: IIUC: there should only ever be a single SweepingAcc instance, // nocommit: and callers should never use/instantiate a SweepingAcc w/o using 'countAcc' ... correct? {code} That would definitely work. However, my initial inclination is to prefer leaving {{SweepingAcc}} as a separate class, because {{CountSlotAcc}} currently clearly does one specific thing, and folding the {{SweepingAcc}} functionality (which could be relatively complex -- potentially deduping {{DocSets}}, etc...) would mix in a different type of functionality that's only relevant in some of the contexts where {{CountSlotAcc}} is currently used. Accessing {{SweepingAcc}} via {{countAcc.getBaseSweepingAcc()}} strikes a balance of using {{countAcc}} as a coordination point for related but distinct functionality ... perhaps a cleaner separation of concerns? {code:java} // nocommit: since 'countAcc' is now the special place all sweeping is tracked, it seems // nocommit: unneccessary (and uneccessarly confusing) for it to also be a 'SweepableSlotAcc' // nocommit: any reason not to just remove this? abstract class CountSlotAcc extends SlotAcc implements ReadOnlyCountSlotAcc /*, SweepableSlotAcc<CountSlotAcc> ... nocommit... */ { ... // nocommit: CountSlotAcc no longer implements SweepableSlotAcc... // @Override // public CountSlotAcc registerSweepingAccs(SweepingAcc baseSweepingAcc) { // baseSweepingAcc.add(new SweepCountAccStruct(fcontext.base, false, this, this)); // baseSweepingAcc.registerMapping(this, this); // return null; // } {code} True, I'm glad you mentioned this. I left this in partly to illustrate another concrete case (aside from SKG) for which sweep collection might be useful. In its current state it admittedly seems a bit contrived, but my thinking was: although {{countAcc}} is currently the one and only {{CountSlotAcc}}, used to accumulate counts over the base domain {{DocSet}} only, there could be cases where extra {{CountSlotAccs}} are used more directly (e.g. as part of stats collection, analogous to how they're used indirectly for SKG sweep collection). In such a case, these "non-base" {{CountSlotAccs}} would respond as implemented in the above {{registerSweepingAccs(...)}} method. More practically speaking, it also occurred to me that one promising use of sweep collection would be to accumulate counts over all subfacet domains in a single sweep (for nested/sub-facets, pivot facets, what-have-you) -- not sure if that would be directly accommodated by the current incarnation of the "sweeping", but it might be a use case to consider. With all that said, I'm not at all opposed to removing the {{SweepableSlotAcc}} interface from {{CountSlotAcc}}; it should anyway be straightforward to add back in later should the need arise. > Improve JSON "terms" facet performance when sorted by relatedness > ------------------------------------------------------------------ > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module > Affects Versions: 7.4, master (9.0) > Reporter: Michael Gibney > Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org