[
https://issues.apache.org/jira/browse/SOLR-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452409#comment-15452409
]
Yonik Seeley commented on SOLR-5725:
------------------------------------
bq. Hmmm, I'm looking at the algorithm in SimpleFacets.getFacetTermEnumCounts
for when the df is < minDfFilterCache – which loops over docs in the
PostingsEnum and checks fastForRandomSet. Wouldn't it be better to leap-frog
(and then don't need a "fast-for-random-set" but do need it to be sorted)?
I had the exact same thought!
This code was written a long time ago, when skipping on postings was slower.
The more sparse a docset is, the better performing skipping should be.
In the case of a base docset that matches many documents and a term that
matches few, it may still be faster to not skip. Also, a low minDfFilterCache
would likely always be fine not skipping. The potentially big performance
improvement would be when minDfFilterCache is large (and we encounter terms
with large df as well). We'd need benchmarking to try and determine where the
crossover point is today.
> Efficient facets without counts for enum method
> -----------------------------------------------
>
> Key: SOLR-5725
> URL: https://issues.apache.org/jira/browse/SOLR-5725
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Alexey Kozhemiakin
> Assignee: Mikhail Khludnev
> Fix For: master (7.0), 6.3
>
> Attachments: SOLR-5725-5x.patch, SOLR-5725-master.patch,
> SOLR-5725.patch, SOLR-5725.patch, SOLR-5725.patch, SOLR-5725.patch,
> SOLR-5725.patch
>
>
> Shot version:
> This improves performance for facet.method=enum when it's enough to know that
> facet count>0, for example when you it's when you dynamically populate
> filters on search form. New method checks if two bitsets intersect instead of
> counting intersection size.
> Long version:
> We have a dataset containing hundreds of millions of records, we facet by
> dozens of fields with many of facet-excludes and have relatively small number
> of unique values in fields, around thousands.
> Before executing search, users work with "advanced search" form, our goal is
> to populate dozens of filters with values which are applicable with other
> selected values, so basically this is a use case for facets with mincount=1,
> but without need in actual counts.
> Our performance tests showed that facet.method=enum works much better than
> fc\fcs, probably due to a specific ratio of "docset"\"unique terms count".
> For example average execution of query time with method fc=1500ms, fcs=2600ms
> and with enum=280ms. Profiling indicated the majority time for enum was spent
> on intersecting docsets.
> Hers's a patch that introduces an extension to facet calculation for
> method=enum. Basically it uses docSetA.intersects(docSetB) instead of
> docSetA. intersectionSize (docSetB).
> As a result we were able to reduce our average query time from 280ms to 60ms.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]