[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097725#comment-17097725 ]
Chris M. Hostetter commented on SOLR-13132: ------------------------------------------- {quote}... If you don't think it's a concern then it's no concern to me (other than that preserving order seems achievable here.. {quote} Since it's already impossible to ensure keys maintain consistent order as the processor/options change, Let's deliberately not concern ourselves with maintaining them if/when sweep is used in the interest in keeping the code simpler. (If/When someone cares about key order consistency down hte road, they'll have to tackle it at a much higher level then just sweeping – and in theory can do it after the buckets have been populated by re-ordering the key/val pairs) {quote}...what would you think about modifying the method signature of SlotAcc.setValues(SimpleOrderedMap<Object> bucket, int slotNum) to return boolean instead of void, in the spirit of the setValues(...) method recently added to the SweepingAcc class? Or maybe less invasive, adding an analogous method to CountSlotAcc that returns boolean? ... {quote} Given that we seem to agree moving towards {{SweepingCountSlotAcc extends CountSlotArrAcc}} as a replacement for {{SweepingAcc}} then id prefer we avoid changing the {{CountSlotAcc}} API at all. {quote}{quote}unless there's another use for CollectSlotAccMappingAware i'm overlooking? {quote} The main/initial purpose of this was to provide an interface for notifying FacetProcessors when SlotAccs had been swapped out. ... {quote} Ugh, sorry ... that was a half thought out idea that i phrased in the form of a bad, and overly brief, question. I understand the current usage of FacetFieldProcessorSlotAccMapper (as a concrete implementation of CollectSlotAccMappingAware) that handles remapping of the sortAcc – what i was mulling was... * Assuming: ** we refactor CountSlotAcc's sweeping logic + SweepingAcc into a new {{SweepingCountSlotAcc}} ** you have no other purpose/usecase in mind for CollectSlotAccMappingAware as an API/abstraction * Then: ** {{SweepingCountSlotAcc}} 's constructor could take in the {{FacetFieldProcessor}} ('this') directly ** {{SweepingCountSlotAcc}} 's version of {{registerMapping()}} could directly manipulate {{processor.sortAcc}} when/if it gets mapped ...but as i said: that idea was half thought out, it's a refactoring to consider down the road depending on how some of hte other more significant API discussions we've been having shake out. Don't worry about it. {quote}... but I agree and I think your suggestion of "Perhaps a single initSweeping(CollectSlotAccMappingAware notify) for use by processors, that fails if called more then once? All other code paths use getBaseSweepingAcc()" is a good way forward. I was waiting to address that until after discussion of these other questions. {quote} But we shouldn't actually need either of those methods anymore if we go the {{SweepingCountSlotAcc}} route ... correct? * {{countAcc.initSweeping(notify)}} can now just be {{countAcc = new SweepingCountSlotAcc(...)}} * {{countAcc.getBaseSweepingAcc()}} can now just be: ** {{(SweepingCountSlotAcc)countAcc}} when passed as an arg to {{SweepableAcc.registerSweepingAccs(...)}} or when doing the actual sweeping ** {{countAcc}} when the purpose is to call {{setValues}} (from FacetFieldProcessor, which should no longer need to have any idea that sweeping is even a concept) ...right? {quote}This seems reasonable. If you don't mind, I can take a crack at seeing what this might look like. I'm particularly interested in doing this in a way that keeps the "sweeping" stuff modular (under the hood, at least), since it's possible that other concrete CountSlotAcc subclasses (whose core "count" functionality is substantially different) could still use the exact same code for coordinating sweeping. (I'm thinking particularly about cached counts ... definitely conscious of not having that bleed into this issue, just trying to be transparent!). {quote} Sure, go for it – but i would urge you not to go too overboard on modularity at this point: if all the details are nicely encapsulated inside a single new {{SweepingCountSlotAcc}} we can always refactor them out into more modular classes later if/when there is a need for that modularity. In the same spirit of being transparent: I'm trying to keep the number of code paths touched by these changes (and the number of new APIS/abstractions/etc...) at a minimum in order to increase confidence in the correctness and reduce the likelihood of introducing bugs or unforseen side effects. (I'd rather focus on releasing a bicycle soon, and later have a dicussion about refactoring it into a car; then stall on releasing a bicycle while we spend a lot of time pre-planning for a future car) Also: we should update your branch to merge in the current apache/master ... there's at least one (small) conflict i know of in the faceting code related to standardizing on using "long" values for count that will probably be easier to resolve sooner then later, and I'd like to get the fix for the changes that bit me in SOLR-14452 so that i can be more confident in beasting tests as we refactor things. ... if you don't object (or haven't already merged master in) by the next time i look at your branch (hopefully monday) i'll merge & push to your branch. > Improve JSON "terms" facet performance when sorted by relatedness > ------------------------------------------------------------------ > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module > Affects Versions: 7.4, master (9.0) > Reporter: Michael Gibney > Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org