[jira] [Updated] (SOLR-5725) Efficient facets without counts for enum method

Mikhail Khludnev (JIRA) Sun, 04 Sep 2016 23:51:06 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mikhail Khludnev updated SOLR-5725:
-----------------------------------
    Description: 
h1. UPD: Specification
To cap facet counts by 1 specify {{facet.exists=true}}. It can be used with 
{{facet.method=enum}} or when it's omitted. It can be used only on non-trie 
fields i.e. strings. It may speed up facet counting on large indices and/or 
high-cardinality facet values..  

h3. Shot version:
This improves performance for facet.method=enum when it's enough to know that 
facet count>0, for example when you it's when you dynamically populate filters 
on search form. New method checks if two bitsets intersect instead of counting 
intersection size.

h3.Long version:
We have a dataset containing hundreds of millions of records, we facet by 
dozens of fields with many of facet-excludes and have relatively small number 
of unique values in fields, around thousands.
Before executing search, users work with "advanced search" form, our  goal is 
to populate dozens of filters with values which are applicable with other 
selected values, so basically this is a use case for facets with mincount=1, 
but without need in actual counts.

Our performance tests showed that facet.method=enum works much better than 
fc\fcs, probably due to a specific ratio of "docset"\"unique terms count". For 
example average execution of query time with method fc=1500ms, fcs=2600ms and 
with enum=280ms. Profiling indicated the majority time for enum was spent on 
intersecting docsets.

Hers's a patch that introduces an extension to facet calculation for 
method=enum. Basically it uses docSetA.intersects(docSetB) instead of docSetA. 
intersectionSize (docSetB).

As a result we were able to reduce our average query time from 280ms to 60ms.


  was:
Shot version:
This improves performance for facet.method=enum when it's enough to know that 
facet count>0, for example when you it's when you dynamically populate filters 
on search form. New method checks if two bitsets intersect instead of counting 
intersection size.


Long version:
We have a dataset containing hundreds of millions of records, we facet by 
dozens of fields with many of facet-excludes and have relatively small number 
of unique values in fields, around thousands.
Before executing search, users work with "advanced search" form, our  goal is 
to populate dozens of filters with values which are applicable with other 
selected values, so basically this is a use case for facets with mincount=1, 
but without need in actual counts.

Our performance tests showed that facet.method=enum works much better than 
fc\fcs, probably due to a specific ratio of "docset"\"unique terms count". For 
example average execution of query time with method fc=1500ms, fcs=2600ms and 
with enum=280ms. Profiling indicated the majority time for enum was spent on 
intersecting docsets.

Hers's a patch that introduces an extension to facet calculation for 
method=enum. Basically it uses docSetA.intersects(docSetB) instead of docSetA. 
intersectionSize (docSetB).

As a result we were able to reduce our average query time from 280ms to 60ms.



> Efficient facets without counts for enum method
> -----------------------------------------------
>
>                 Key: SOLR-5725
>                 URL: https://issues.apache.org/jira/browse/SOLR-5725
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Alexey Kozhemiakin
>            Assignee: Mikhail Khludnev
>             Fix For: master (7.0), 6.3
>
>         Attachments: SOLR-5725-5x.patch, SOLR-5725-master.patch, 
> SOLR-5725.patch, SOLR-5725.patch, SOLR-5725.patch, SOLR-5725.patch, 
> SOLR-5725.patch, SOLR-5725.patch, SOLR-5725.patch, 
> facet.limit=0&facet.missing=true discrepancy between cloud and non-distr.txt
>
>
> h1. UPD: Specification
> To cap facet counts by 1 specify {{facet.exists=true}}. It can be used with 
> {{facet.method=enum}} or when it's omitted. It can be used only on non-trie 
> fields i.e. strings. It may speed up facet counting on large indices and/or 
> high-cardinality facet values..  
> h3. Shot version:
> This improves performance for facet.method=enum when it's enough to know that 
> facet count>0, for example when you it's when you dynamically populate 
> filters on search form. New method checks if two bitsets intersect instead of 
> counting intersection size.
> h3.Long version:
> We have a dataset containing hundreds of millions of records, we facet by 
> dozens of fields with many of facet-excludes and have relatively small number 
> of unique values in fields, around thousands.
> Before executing search, users work with "advanced search" form, our  goal is 
> to populate dozens of filters with values which are applicable with other 
> selected values, so basically this is a use case for facets with mincount=1, 
> but without need in actual counts.
> Our performance tests showed that facet.method=enum works much better than 
> fc\fcs, probably due to a specific ratio of "docset"\"unique terms count". 
> For example average execution of query time with method fc=1500ms, fcs=2600ms 
> and with enum=280ms. Profiling indicated the majority time for enum was spent 
> on intersecting docsets.
> Hers's a patch that introduces an extension to facet calculation for 
> method=enum. Basically it uses docSetA.intersects(docSetB) instead of 
> docSetA. intersectionSize (docSetB).
> As a result we were able to reduce our average query time from 280ms to 60ms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-5725) Efficient facets without counts for enum method

Reply via email to