[
https://issues.apache.org/jira/browse/SOLR-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943092#comment-13943092
]
David Smiley commented on SOLR-5894:
------------------------------------
Awesome work Toke!
> Speed up high-cardinality facets with sparse counters
> -----------------------------------------------------
>
> Key: SOLR-5894
> URL: https://issues.apache.org/jira/browse/SOLR-5894
> Project: Solr
> Issue Type: Improvement
> Components: SearchComponents - other
> Affects Versions: 4.6.1, 4.7
> Reporter: Toke Eskildsen
> Priority: Minor
> Fix For: 4.6.1
>
> Attachments: SOLR-5894.patch,
> author_7M_tags_1852_logged_queries_warmed.png
>
>
> Field based faceting in Solr has two phases: Collecting counts for tags in
> facets and extracting the requested tags.
> The execution time for the collecting phase is approximately linear to the
> number of hits and the number of references from hits to tags. This phase is
> not the focus here.
> The extraction time scales with the number of unique tags in the search
> result, but is also heavily influenced by the total number of unique tags in
> the facet as every counter, 0 or not, is visited by the extractor (at least
> for count order). For fields with millions of unique tag values this means
> 10s of milliseconds added to the minimum response time (see
> https://sbdevel.wordpress.com/2014/03/18/sparse-facet-counting-on-a-real-index/
> for a test on a corpus with 7M unique values in the facet).
> The extractor needs to visit every counter due to the current counter
> structure being a plain int-array of size #unique_tags. Switching to a sparse
> structure, where only the tag counters > 0 are visited, makes the extraction
> time linear to the number of unique tags in the result set.
> Unfortunately the number of unique tags in the result set is unknown at
> collect time, so it is not possible to reliably select sparse counting vs.
> full counting up front. Luckily there exists solutions for sparse sets that
> has the property of switching to non-sparse-mode without a switch-penalty,
> when the sparse-threshold is exceeded (see
> http://programmingpraxis.com/2012/03/09/sparse-sets/ for an example). This
> JIRA aims to implement this functionality in Solr (a proof of concept patch
> will be provided shortly).
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]