[
https://issues.apache.org/jira/browse/SOLR-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Per Steffensen updated SOLR-5444:
---------------------------------
Description:
We have a 6-Solr-node (release 4.4.0) setup with 12 billion "small" documents
loaded across 3 collections. The documents have the following fields
* a_dlng_doc_sto (docvalue long)
* b_dlng_doc_sto (docvalue long)
* c_dstr_doc_sto (docvalue string)
* timestamp_lng_ind_sto (indexed long)
* d_lng_ind_sto (indexed long)
>From schema.xml
{code}
<dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false"
stored="true" required="true" docValues="true"/>
<dynamicField name="*_lng_ind_sto" type="long" indexed="true"
stored="true"/>
<dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false"
stored="true" required="true" docValues="true"/>
...
<fieldType name="dstring" class="solr.StrField" sortMissingLast="true"
docValuesFormat="Disk"/>
<fieldType name="dlng" class="solr.TrieLongField" precisionStep="0"
positionIncrementGap="0" docValuesFormat="Disk"/>
{code}
timestamp_lng_ind_sto decides which collection documents go into
We execute queries on the following format:
* q=timestamp_lng_ind_sto:\[x TO y\] AND d_lng_ind_sto:(a OR b OR ... OR n)
*
facet=true&facet.field=a_dlng_doc_sto&facet.zeros=false&facet.mincount=1&facet.limit=<asked-for-facets>&rows=0&start=0
We see very slow response-time when hitting large number of rows, spanning lots
of facets, but only ask for "a few" of those rows
Example
* With x and y plus a, b ... n set to values so that
* The timestamp_lng_ind_sto:\[x TO y\] part of the search-criteria alone hit
about 1.7 billion documents
* The d_lng_ind_sto:(a OR b OR ... OR n) part of the search-criteria alone hit
about 500.000 documents
* The combined search-criteria (timestamp_lng_ind_sto AND'ed with
d_lng_ind_sto) hit about 200.000 documents
!Profiling_SimpleFacets_getListedTermCounts_path.png!
was:TBD
> Slow response on facet search, lots of facets, asking for few facets in
> response
> --------------------------------------------------------------------------------
>
> Key: SOLR-5444
> URL: https://issues.apache.org/jira/browse/SOLR-5444
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Affects Versions: 4.4
> Reporter: Per Steffensen
> Assignee: Per Steffensen
> Labels: docvalue, faceted-search, performance
> Fix For: 4.7
>
> Attachments: Profiiling_SimpleFacets_getListedTermCounts_path.png,
> Profiling_SimpleFacets_getTermCounts_path.png,
> Responsetime_func_of_facets_asked_for.png
>
>
> We have a 6-Solr-node (release 4.4.0) setup with 12 billion "small" documents
> loaded across 3 collections. The documents have the following fields
> * a_dlng_doc_sto (docvalue long)
> * b_dlng_doc_sto (docvalue long)
> * c_dstr_doc_sto (docvalue string)
> * timestamp_lng_ind_sto (indexed long)
> * d_lng_ind_sto (indexed long)
> From schema.xml
> {code}
> <dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false"
> stored="true" required="true" docValues="true"/>
> <dynamicField name="*_lng_ind_sto" type="long" indexed="true"
> stored="true"/>
> <dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false"
> stored="true" required="true" docValues="true"/>
> ...
> <fieldType name="dstring" class="solr.StrField" sortMissingLast="true"
> docValuesFormat="Disk"/>
> <fieldType name="dlng" class="solr.TrieLongField" precisionStep="0"
> positionIncrementGap="0" docValuesFormat="Disk"/>
> {code}
> timestamp_lng_ind_sto decides which collection documents go into
> We execute queries on the following format:
> * q=timestamp_lng_ind_sto:\[x TO y\] AND d_lng_ind_sto:(a OR b OR ... OR n)
> *
> facet=true&facet.field=a_dlng_doc_sto&facet.zeros=false&facet.mincount=1&facet.limit=<asked-for-facets>&rows=0&start=0
> We see very slow response-time when hitting large number of rows, spanning
> lots of facets, but only ask for "a few" of those rows
> Example
> * With x and y plus a, b ... n set to values so that
> * The timestamp_lng_ind_sto:\[x TO y\] part of the search-criteria alone hit
> about 1.7 billion documents
> * The d_lng_ind_sto:(a OR b OR ... OR n) part of the search-criteria alone
> hit about 500.000 documents
> * The combined search-criteria (timestamp_lng_ind_sto AND'ed with
> d_lng_ind_sto) hit about 200.000 documents
> !Profiling_SimpleFacets_getListedTermCounts_path.png!
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]