[
https://issues.apache.org/jira/browse/SOLR-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448793#comment-17448793
]
Ishan Chattopadhyaya commented on SOLR-15715:
---------------------------------------------
I just wrapped up an initial round of testing on
* regular setup (6 data nodes)
* POC setup (1 dedicated overseer + 1 coordinator + 6 data nodes).
h2. Setup details
Regular setup:
* 6 nodes
* 2GB heap space on every node
* 6 collections, 6 shards each, 1 replica per shard
* Documents 30M per collection (ecommerce events dataset)
* Queries: 20,000 per collection, all queries on faceting (filtered by
timeranges)
* Query rate: 2 threads per collection, 6 collections at the same time.
* Query target node: first data node (port 50000)
POC setup:
* 8 nodes: 1 dedicated overseer, 1 coordinator node, 6 data nodes
* 2GB heap space on every node
* 6 collections, 6 shards each, 1 replica per shard
* Documents 30M per collection (ecommerce events dataset)
* Queries: 20,000 per collection, all queries on faceting (filtered by
timeranges)
* Query rate: 2 threads per collection, 6 collections at the same time.
* Query target node: coordinator node (port 50001)
h2. Performance results
Here are the results,
Regular setup results:
^!0001.jpg!^
POC results:
!0001.jpg!
h2. Conclusion
* Due to a separate coordinator node, memory usage on data nodes very low.
* Isolated coordinator node feature for query aggregation working as designed.
> Dedicated query aggregator nodes in the solr cluster.
> ------------------------------------------------------
>
> Key: SOLR-15715
> URL: https://issues.apache.org/jira/browse/SOLR-15715
> Project: Solr
> Issue Type: New Feature
> Components: SearchComponents - other
> Affects Versions: 8.10.1
> Reporter: Hitesh Khamesra
> Priority: Major
> Attachments: 0001-1.jpg, 0001.jpg, coordinator-poc.pdf,
> regular-node.pdf
>
>
> We have a large collection with 1000s of shards in the solr cluster. We have
> observed that distributed solr query takes many resources(thread, memory,
> etc.) on the solr data node(node which contains indexes). Thus we need
> dedicated query nodes to execute distributed queries on large solr
> collection. That would reduce the memory/cpu pressure from solr data nodes.
> Elastis search has similar functionality
> [here|https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#coordinating-node]
>
> [~noble.paul] [~ichattopadhyaya]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]