[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142526#comment-17142526 ]
Chris M. Hostetter commented on SOLR-13132: ------------------------------------------- michael: I'm really sorry, i've been swamped with some other stuff and have not yet had a chance to properly & fully catch up on all of your changes since my last comment – but i did want to pop in and say: * from what i can tell at a quick skim, i really like the direction you took with removing hte shim .. seems much simpler and more straight forward to understand * i'm a little concerned that as we've made various incremental changes, {{registerSweepingAccIfSupportedByCollectAcc(SweepingCountSlotAcc)}} has evolved into a multi-headed beast... ** the original point of this method was to refactor & dedpulicate some redundent code across multiple processors... *** the primary purpose was to give subclasses a helper method to call to ensure {{collectAcc.registerSweepingAccs(sweepingCountAcc)}} got called if: a) sweepingCountAcc was a thing, b) collectAcc supported sweeping *** add bonus: it eliminated the need for duplicate instanceof/casting checks that were in multiple places ** as the code & method signature have evolved, this is now called: *** in various places that still have to do their own instanceof/casting checks of SweepingCountSlotAcc when they call it *** very late in the "lifecycle" of the processors, in way that makes it un-obvious to me if it might be getting called multiple times for the same processor? (which may not be a probem now even if they do, but might cause weird bugs down the road if/when more things support sweeping) ** i think we should: *** eliminate the arg from this API, and make it operate directly on {{this.countAcc}} (if and only if it's an instanceof SweepingCountSlotAcc, otherwise this method should be a No-Op ... this is how it worked originally IIRC) *** move _where_ this method is called so that it is called as early, and infrequently, as possible ... is there any reason we can't make it the last line of {{FacetFieldProcessorByArray.createCollectAcc()}} ? (the only other place countAcc/collectAcc might be modified are the refinement path where we *know* explicitly we're not sweepin) *** add some additional new helper methods (to {{FacetFieldProcessor(ByArray?)}} so we don't have to have these blocks of code redundently in 3 places (where {{registerSweepingAccIfSupportedByCollectAcc}} is currently being called) ... {code:java} /** returns the empty list unless this processor is not using sweeping */ public List<SweepCountAccStruct> getOtherSweepStructs() { if (countAcc instanceof SweepingCountSlotAcc) { return ((SweepingCountSlotAcc) countAcc).others; } return Collections.emptyList(); } /** returns the base sweep structure to use, or a dummy struct if sweeping is not used */ public SweepCountAccStruct getBaseSweepStruct() { if (countAcc instanceof SweepingCountSlotAcc) { return ((SweepingCountSlotAcc) countAcc).base; } return SweepCountAccStruct(fcontext.base, true, countAcc); } {code} WDYT? > Improve JSON "terms" facet performance when sorted by relatedness > ------------------------------------------------------------------ > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module > Affects Versions: 7.4, master (9.0) > Reporter: Michael Gibney > Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org