[
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ben DeMott updated SOLR-10284:
------------------------------
Description:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.
Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.
I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways. I
just want to get consensus from the community about how to provide the best
solution.
My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2
Proposed Solution:
My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would
default to TRUE for the solr.in.sh file found next to bin/solr). Upon
connection or reconnection of the Zookeeper Client, it would ask the server
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try
the next host. If all hosts are in standalone, an error would be shown - "No
zookeeper hosts available, that aren't in standalone operation - The setting
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"
In order to urge users to use the setting, I would possibly also have a warning
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the
connection string, and ZK_STANDALONE is not false.
I can't think of any implicit way to internalize a setting.... Other than....
ZK_HOSTS connection string setting has multiple hosts, there should be no
scenario in which any node is standalone, so you could assume there should be
no standalone servers. But maybe an explicit setting is preferable.
This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.
*Update 6/26/2017:*
I started working on this, and it occurred to me the same issue exists for
SolrJ clients. So SolrJ might be the place to make this change. I'm not sure
yet.
A SolrJ client that has a multi-zk-node connection string that connects (even
temporarily) to a zk host that is standalone will think there are no solr hosts
available to satisfy the request, or it will believe there are no solr hosts
that can answer the query, and you'll get the following error.
``CloudSolrClient - Request to collection efc-profiles-match-col failed due to
(510) org.apache.solr.common.SolrException: Could not find a healthy node to
handle the request.``
I am not as familiar with the SolrJ codebase ... so I'll have to do some
digging.
Instead of moving onto a different Zookeeper host, the SolrJ client will think
everything is fully working, just no collections.
was:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.
Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.
I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways. I
just want to get consensus from the community about how to provide the best
solution.
My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2
Proposed Solution:
My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would
default to TRUE for the solr.in.sh file found next to bin/solr). Upon
connection or reconnection of the Zookeeper Client, it would ask the server
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try
the next host. If all hosts are in standalone, an error would be shown - "No
zookeeper hosts available, that aren't in standalone operation - The setting
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"
In order to urge users to use the setting, I would possibly also have a warning
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the
connection string, and ZK_STANDALONE is not false.
I can't think of any implicit way to internalize a setting.... Other than....
ZK_HOSTS connection string setting has multiple hosts, there should be no
scenario in which any node is standalone, so you could assume there should be
no standalone servers. But maybe an explicit setting is preferable.
This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.
> Solr connection to Standalone node in Ensemble causes cluster failure
> ---------------------------------------------------------------------
>
> Key: SOLR-10284
> URL: https://issues.apache.org/jira/browse/SOLR-10284
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 6.3, 6.4
> Environment: Solrcloud, with Zookeeper <any version>
> Reporter: Ben DeMott
>
> I posted this issue on the Dev mailing list and was encouraged to create a
> Jira ticket. This isn't a bug per-se.
> Solr connects / reconnects to "Standalone" Zookeeper nodes, within an
> ensemble cluster, which causes absolute havoc.
> I work for Dice.com, as one of the core search developers.
> I'm happy to write a patch, as we'll probably do that internally anyways. I
> just want to get consensus from the community about how to provide the best
> solution.
> My original email describing the issue:
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2
> Proposed Solution:
> My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would
> default to TRUE for the solr.in.sh file found next to bin/solr). Upon
> connection or reconnection of the Zookeeper Client, it would ask the server
> "are you standalone", and disconnect if it is and ZK_STANDALONE=false, and
> try the next host. If all hosts are in standalone, an error would be shown -
> "No zookeeper hosts available, that aren't in standalone operation - The
> setting ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"
> In order to urge users to use the setting, I would possibly also have a
> warning shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the
> connection string, and ZK_STANDALONE is not false.
> I can't think of any implicit way to internalize a setting.... Other than....
> ZK_HOSTS connection string setting has multiple hosts, there should be no
> scenario in which any node is standalone, so you could assume there should be
> no standalone servers. But maybe an explicit setting is preferable.
> This solution should be:
> 1.) backwards compatible
> 2.) have very little performance impact (1 extra call upon connection to ZK)
> 3.) isolated to one part of the code.
> *Update 6/26/2017:*
> I started working on this, and it occurred to me the same issue exists for
> SolrJ clients. So SolrJ might be the place to make this change. I'm not sure
> yet.
> A SolrJ client that has a multi-zk-node connection string that connects (even
> temporarily) to a zk host that is standalone will think there are no solr
> hosts available to satisfy the request, or it will believe there are no solr
> hosts that can answer the query, and you'll get the following error.
> ``CloudSolrClient - Request to collection efc-profiles-match-col failed due
> to (510) org.apache.solr.common.SolrException: Could not find a healthy node
> to handle the request.``
> I am not as familiar with the SolrJ codebase ... so I'll have to do some
> digging.
> Instead of moving onto a different Zookeeper host, the SolrJ client will
> think everything is fully working, just no collections.
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]