[
https://issues.apache.org/jira/browse/SOLR-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819321#comment-17819321
]
Gus Heck commented on SOLR-15715:
---------------------------------
Obviously very late to the party, and likely have missed something, but looking
at this and some of the code, I'm given to wonder why this wasn't achieved via
client request preferences (which was implemented), node labeling, and replica
placement (to ensure certain labeled nodes never get data).
Nodes without data that receive all the client requests is job done, right?
In the current code it seems that only tests will ever call
"setPreferredNodes()" which makes me think that this feature only works if the
end-user client is manually tracking what nodes are coordinators?
I guess my biggest Q is why do we need subclasses of HttpSolrCall? This seems
achievable with node labels, a node role that adds a label, client smarts, and
replica placement.
I see a bunch of references to "synthetic collection" in the code, but it's not
clear what this is or why its needed. From the javadoc:
{code:java}
/**
* A coordinator node can serve requests as if it hosts all collections in the
cluster. it does so
* by hosting a synthetic replica for each configset used in the cluster.{code}
Why do we want to do that? Existing code already knew how to find shards,
delegate sub requests and coordinate a response, why do we need to fake the
location of the collections with a synthetic replica?
> Dedicated query coordinator nodes in the solr cluster
> -----------------------------------------------------
>
> Key: SOLR-15715
> URL: https://issues.apache.org/jira/browse/SOLR-15715
> Project: Solr
> Issue Type: New Feature
> Components: SearchComponents - other
> Affects Versions: 8.10.1
> Reporter: Hitesh Khamesra
> Assignee: Noble Paul
> Priority: Major
> Fix For: 9.1
>
> Attachments: coordinator-poc.jpg, coordinator-poc.pdf,
> coordinator-vs-data-nodes.jpg, regular-node.jpg, regular-node.pdf
>
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> We have a large collection with 1000s of shards in the solr cluster. We have
> observed that distributed solr query takes many resources(thread, memory,
> etc.) on the solr data node(node which contains indexes). Thus we need
> dedicated query nodes to execute distributed queries on large solr
> collection. That would reduce the memory/cpu pressure from solr data nodes.
> Elastis search has similar functionality
> [here|https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#coordinating-node]
>
> [~noble.paul] [~ichattopadhyaya]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]