[ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106560#comment-17106560
 ] 

Michael Gibney commented on SOLR-13132:
---------------------------------------

I just pushed several commits (each with "SOLR-14467" in the commit message). 
The commits are fairly digestible I think, but although they illustrate (and I 
believe fix) the problem, they are likely not appropriate for use as-is; I 
tried to mark with nocommit messages accordingly.

I did the work on this branch rather than at SOLR-14467 because the testing 
built out for this issue (SOLR-13132) was helpful, and it was more generally 
helpful to compare consistency across sweep/non-sweep implementations to inform 
how _best_ to go about addressing the issues raised by {{allBuckets}} in even a 
strictly non-sweep context. That said, I tried to separate things out to make 
it clear what part of the fix would likely be applicable to the current master 
branch (I think the relevant commit to "backport" to master would be 
22446b126de3a6d66c8a9270e1d583d89b07865c).

I think that the use of {{RelatednessAgg}} in {{allBuckets}} may be 
fundamentally incompatible with deferred ({{otherAccs}}) collection. The 
approach I took to address this is to prevent {{RelatednessAgg}} from being 
deferred when {{allBuckets=true}}. Another possibility, not entirely thought 
through, would be to somehow make {{RelatednessAgg}} aware of when it's being 
used in a deferred (otherAccs) context, and cumulatively track allBuckets data 
in a way that is not reset by calls to SKGSlotAcc.reset(). I kind of don't see 
how that would work though, and I think my confusion at this point centers on 
how any single {{otherAcc}} with {{numSlots==1}} can ever cumulatively track 
any stats for allBuckets. I'm probably missing something here, but in any event 
I'm hoping that these commits will prove to be a good starting point for 
discussion!

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to