[jira] [Updated] (SOLR-8146) Allowing SolrJ CloudSolrClient to have preferred replica for query/read

Arcadius Ahouansou (JIRA) Tue, 10 Nov 2015 22:33:37 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-8146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arcadius Ahouansou updated SOLR-8146:
-------------------------------------
    Description: 
h1. Backgrouds

Currently, the CloudSolrClient randomly picks a replica to query.
This is done by shuffling the list of live URLs to query then, picking the 
first item from the list.

This ticket is to allow more flexibility and control to some extend which URLs 
will be picked up for queries.

Note that this is for queries only and would not affect update/delete/admin 
operations.

h1. Implementation

The current patch uses regex pattern and moves to the top of the list or URLs 
only those matching the given regex specified by the system property 
```solr.preferredQueryNodePattern```

h1. Applications
For simplicity, let's say that we have  a SolrSloud cluster deployed on 2 
separate racks: rack1 and rack2.

On each rack, we have a set of SolrCloud VMs as well as a couple of client VMs 
querying solr using SolrJ.

All solr nodes are identical and have the same number of collections.

What we would like to achieve is:
- clients on rack1 will by preference query only SolrCloud nodes on rack1, and 
- clients on rack2 will by preference query only SolrCloud nodes on rack2.
- Cross-rack read will happen if and only if one of the racks has no available 
Solr node to serve a request.

In other words, we want read operations to be local to a rack whenever possible.

Note that write/update/delete/admin operations should not be affected.

Initially, I thought it may be good to have Solr nodes tagged with rackID 
(snitch?) for matching the hosts.

Note that this feature may have many usages such as SOLR-5501

Note that in our use case, we have a cross DC deployment. So, replace 
rack1/rack2 by DC1/DC2

Any comment would be very appreciated.

Thanks.


  was:
This is a simple proposal to allow more flexibility about which node SolrJ 
queries first.
This is mainly to avoid unnecessary traffic in the network.

For simplicity, let's say that we have  a SolrSloud cluster deployed on 2 
separate racks: rack1 and rack2.

On each rack, we have a set of SolrCloud VMs as well as a couple of client VMs 
querying solr using SolrJ.

All solr nodes are identical and have the same number of collections.

What we would like to achieve is:
- clients on rack1 will by preference query only SolrCloud nodes on rack1, and 
- clients on rack2 will by preference query only SolrCloud nodes on rack2.
- Cross-rack read will happen if and only if one of the racks has no available 
Solr node to serve a request.

In other words, we want read operations to be local to a rack whenever possible.

Note that write/update/delete/admin operations should not be affected.

Initially, I thought it may be good to have Solr nodes tagged with rackID 
(snitch?) for matching the hosts.

Note that this feature may have many usages such as SOLR-5501

Note that in our use case, we have a cross DC deployment. So, replace 
rack1/rack2 by DC1/DC2

Any comment would be very appreciated.

Thanks.



> Allowing SolrJ CloudSolrClient to have preferred replica for query/read
> -----------------------------------------------------------------------
>
>                 Key: SOLR-8146
>                 URL: https://issues.apache.org/jira/browse/SOLR-8146
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java
>    Affects Versions: 5.3
>            Reporter: Arcadius Ahouansou
>         Attachments: SOLR-8146.patch, SOLR-8146.patch, SOLR-8146.patch
>
>
> h1. Backgrouds
> Currently, the CloudSolrClient randomly picks a replica to query.
> This is done by shuffling the list of live URLs to query then, picking the 
> first item from the list.
> This ticket is to allow more flexibility and control to some extend which 
> URLs will be picked up for queries.
> Note that this is for queries only and would not affect update/delete/admin 
> operations.
> h1. Implementation
> The current patch uses regex pattern and moves to the top of the list or URLs 
> only those matching the given regex specified by the system property 
> ```solr.preferredQueryNodePattern```
> h1. Applications
> For simplicity, let's say that we have  a SolrSloud cluster deployed on 2 
> separate racks: rack1 and rack2.
> On each rack, we have a set of SolrCloud VMs as well as a couple of client 
> VMs querying solr using SolrJ.
> All solr nodes are identical and have the same number of collections.
> What we would like to achieve is:
> - clients on rack1 will by preference query only SolrCloud nodes on rack1, 
> and 
> - clients on rack2 will by preference query only SolrCloud nodes on rack2.
> - Cross-rack read will happen if and only if one of the racks has no 
> available Solr node to serve a request.
> In other words, we want read operations to be local to a rack whenever 
> possible.
> Note that write/update/delete/admin operations should not be affected.
> Initially, I thought it may be good to have Solr nodes tagged with rackID 
> (snitch?) for matching the hosts.
> Note that this feature may have many usages such as SOLR-5501
> Note that in our use case, we have a cross DC deployment. So, replace 
> rack1/rack2 by DC1/DC2
> Any comment would be very appreciated.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-8146) Allowing SolrJ CloudSolrClient to have preferred replica for query/read

Reply via email to