[jira] [Commented] (SOLR-15715) Dedicated query coordinator nodes in the solr cluster

Gus Heck (Jira) Wed, 21 Feb 2024 08:43:04 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819321#comment-17819321
 ]


Gus Heck commented on SOLR-15715:
---------------------------------

Obviously very late to the party, and likely have missed something, but looking 
at this and some of the code, I'm given to wonder why this wasn't achieved via 
client request preferences (which was implemented), node labeling, and replica 
placement (to ensure certain labeled nodes never get data).

 

Nodes without data that receive all the client requests is job done, right?

 

In the current code it seems that only tests will ever call 
"setPreferredNodes()" which makes me think that this feature only works if the 
end-user client is manually tracking what nodes are coordinators?

I guess my biggest Q is why do we need subclasses of HttpSolrCall? This seems 
achievable with node labels, a node role that adds a label, client smarts, and 
replica placement.

I see a bunch of references to "synthetic collection" in the code, but it's not 
clear what this is or why its needed. From the javadoc: 
{code:java}
/**
 * A coordinator node can serve requests as if it hosts all collections in the 
cluster. it does so
 * by hosting a synthetic replica for each configset used in the cluster.{code}
Why do we want to do that? Existing code already knew how to find shards, 
delegate sub requests and coordinate a response, why do we need to fake the 
location of the collections with a synthetic replica?

 

> Dedicated query coordinator nodes in the solr cluster
> -----------------------------------------------------
>
>                 Key: SOLR-15715
>                 URL: https://issues.apache.org/jira/browse/SOLR-15715
>             Project: Solr
>          Issue Type: New Feature
>          Components: SearchComponents - other
>    Affects Versions: 8.10.1
>            Reporter: Hitesh Khamesra
>            Assignee: Noble Paul
>            Priority: Major
>             Fix For: 9.1
>
>         Attachments: coordinator-poc.jpg, coordinator-poc.pdf, 
> coordinator-vs-data-nodes.jpg, regular-node.jpg, regular-node.pdf
>
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We have a large collection with 1000s of shards in the solr cluster. We have 
> observed that distributed solr query takes many resources(thread, memory, 
> etc.) on the solr data node(node which contains indexes). Thus we need 
> dedicated query nodes to execute distributed queries on large solr 
> collection. That would reduce the memory/cpu pressure from solr data nodes.
> Elastis search has similar functionality 
> [here|https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#coordinating-node]
>  
> [~noble.paul] [~ichattopadhyaya]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-15715) Dedicated query coordinator nodes in the solr cluster

Reply via email to