[
https://issues.apache.org/jira/browse/SOLR-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296257#comment-15296257
]
Varun Thacker commented on SOLR-9142:
-------------------------------------
I tried {{method:stream}} by indexing the same data on the {{gettingstarted}}
example. The results are the same as compared to the non cloud setup.
The speed gains are lesser though. The query returns in the 3s range. I guess
combining the results is adding the overhead.
> Improve JSON nested facets effeciency
> -------------------------------------
>
> Key: SOLR-9142
> URL: https://issues.apache.org/jira/browse/SOLR-9142
> Project: Solr
> Issue Type: Bug
> Reporter: Varun Thacker
>
> I indexed a dataset of 2M docs
> {{top_facet_s}} has a cardinality of 1000 which is the top level facet.
> For nested facets it has two fields {{sub_facet_unique_s}} and
> {{sub_facet_unique_td}} which are string and double and have cardinality 2M
> The nested query for the double field returns in the 1s mark always. The
> nested query for the string field takes roughly 10s to execute.
> {code:title=nested string facet|borderStyle=solid}
> q=*:*&rows=0&json.facet=
> {
> "top_facet_s": {
> "type": "terms",
> "limit": -1,
> "field": "top_facet_s",
> "mincount": 1,
> "excludeTags": "ANY",
> "facet": {
> "sub_facet_unique_s": {
> "type": "terms",
> "limit": 1,
> "field": "sub_facet_unique_s",
> "mincount": 1
> }
> }
> }
> }
> {code}
> {code:title=nested double facet|borderStyle=solid}
> q=*:*&rows=0&json.facet=
> {
> "top_facet_s": {
> "type": "terms",
> "limit": -1,
> "field": "top_facet_s",
> "mincount": 1,
> "excludeTags": "ANY",
> "facet": {
> "sub_facet_unique_s": {
> "type": "terms",
> "limit": 1,
> "field": "sub_facet_unique_td",
> "mincount": 1
> }
> }
> }
> }
> {code}
> I tried to dig deeper to understand why are string nested faceting that slow
> compared to numeric field
> Since the top facet has a cardinality of 1000 we have to calculate sub facets
> on each of them. Now the key difference was in the implementation of the two .
> For the string field, In {{FacetField#getFieldCacheCounts}} we call
> {{createCollectAcc}} with nDocs=0 and numSlots=2M . This then initializes an
> array of 2M. So we create a 2M array 1000 times for this one query which from
> what I understand makes this query slow.
> For numeric fields {{FacetFieldProcessorNumeric#calcFacets}} uses a
> CountSlotAcc which doesn't assign a huge array. In this query it calls
> {{createCollectAcc}} with numDocs=2k and numSlots=1024 .
> In string faceting, we create the 2M array because the cardinality is 2M and
> we use the array position as the ordinal and value as the count. If we could
> improve on this it would speed things up significantly? For sub-facets we
> know the maximum cardinality can be at max the top level bucket count.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]