[ 
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben DeMott updated SOLR-10284:
------------------------------
    Description: 
I posted this issue on the Dev mailing list and was encouraged to create a Jira 
ticket.  This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble 
cluster, which causes absolute havoc. 

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways.  I 
just want to get consensus from the community about how to provide the best 
solution.

My original email describing the issue: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
connection or reconnection of the Zookeeper Client, it would ask the server 
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try 
the next host.  If all hosts are in standalone, an error would be shown - "No 
zookeeper hosts available, that aren't in standalone operation - The setting 
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning 
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting.... Other than....  
ZK_HOSTS connection string setting has multiple hosts, there should be no 
scenario in which any node is standalone, so you could assume there should be 
no standalone servers.  But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

*Update 6/26/2017:*

I started working on this, and it occurred to me the same issue exists for 
*SolrJ* clients.  So SolrJ might be the place to make this change. I'm not sure 
yet.
A SolrJ client that has a multi-zk-node connection string that connects (even 
temporarily) to a zk host that is standalone will believe there are no Solr 
hosts that can answer the query, and you'll get the following error.  

{{CloudSolrClient - Request to collection efc-profiles-match-col failed due to 
(510) org.apache.solr.common.SolrException: Could not find a healthy node to 
handle the request.}}

I am not as familiar with the SolrJ codebase ... so I'll have to do some 
digging.

Instead of moving onto a different Zookeeper host, the SolrJ client will think 
everything is fully working, just no collections.
 

  was:
I posted this issue on the Dev mailing list and was encouraged to create a Jira 
ticket.  This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble 
cluster, which causes absolute havoc. 

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways.  I 
just want to get consensus from the community about how to provide the best 
solution.

My original email describing the issue: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
connection or reconnection of the Zookeeper Client, it would ask the server 
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try 
the next host.  If all hosts are in standalone, an error would be shown - "No 
zookeeper hosts available, that aren't in standalone operation - The setting 
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning 
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting.... Other than....  
ZK_HOSTS connection string setting has multiple hosts, there should be no 
scenario in which any node is standalone, so you could assume there should be 
no standalone servers.  But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

*Update 6/26/2017:*

I started working on this, and it occurred to me the same issue exists for 
*SolrJ * clients.  So SolrJ might be the place to make this change. I'm not 
sure yet.
A SolrJ client that has a multi-zk-node connection string that connects (even 
temporarily) to a zk host that is standalone will believe there are no Solr 
hosts that can answer the query, and you'll get the following error.  

{{CloudSolrClient - Request to collection efc-profiles-match-col failed due to 
(510) org.apache.solr.common.SolrException: Could not find a healthy node to 
handle the request.}}

I am not as familiar with the SolrJ codebase ... so I'll have to do some 
digging.

Instead of moving onto a different Zookeeper host, the SolrJ client will think 
everything is fully working, just no collections.


 


> Solr connection to Standalone node in Ensemble causes cluster failure
> ---------------------------------------------------------------------
>
>                 Key: SOLR-10284
>                 URL: https://issues.apache.org/jira/browse/SOLR-10284
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 6.3, 6.4
>         Environment: Solrcloud, with Zookeeper <any version>
>            Reporter: Ben DeMott
>
> I posted this issue on the Dev mailing list and was encouraged to create a 
> Jira ticket.  This isn't a bug per-se.
> Solr connects / reconnects to "Standalone" Zookeeper nodes, within an 
> ensemble cluster, which causes absolute havoc. 
> I work for Dice.com, as one of the core search developers.
> I'm happy to write a patch, as we'll probably do that internally anyways.  I 
> just want to get consensus from the community about how to provide the best 
> solution.
> My original email describing the issue: 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2
> Proposed Solution:
> My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
> default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
> connection or reconnection of the Zookeeper Client, it would ask the server 
> "are you standalone", and disconnect if it is and ZK_STANDALONE=false, and 
> try the next host.  If all hosts are in standalone, an error would be shown - 
> "No zookeeper hosts available, that aren't in standalone operation - The 
> setting ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"
> In order to urge users to use the setting, I would possibly also have a 
> warning shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
> connection string, and ZK_STANDALONE is not false.
> I can't think of any implicit way to internalize a setting.... Other than.... 
>  ZK_HOSTS connection string setting has multiple hosts, there should be no 
> scenario in which any node is standalone, so you could assume there should be 
> no standalone servers.  But maybe an explicit setting is preferable.
> This solution should be:
> 1.) backwards compatible
> 2.) have very little performance impact (1 extra call upon connection to ZK)
> 3.) isolated to one part of the code.
> *Update 6/26/2017:*
> I started working on this, and it occurred to me the same issue exists for 
> *SolrJ* clients.  So SolrJ might be the place to make this change. I'm not 
> sure yet.
> A SolrJ client that has a multi-zk-node connection string that connects (even 
> temporarily) to a zk host that is standalone will believe there are no Solr 
> hosts that can answer the query, and you'll get the following error.  
> {{CloudSolrClient - Request to collection efc-profiles-match-col failed due 
> to (510) org.apache.solr.common.SolrException: Could not find a healthy node 
> to handle the request.}}
> I am not as familiar with the SolrJ codebase ... so I'll have to do some 
> digging.
> Instead of moving onto a different Zookeeper host, the SolrJ client will 
> think everything is fully working, just no collections.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to