[
https://issues.apache.org/jira/browse/SOLR-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grzegorz Lebek updated SOLR-12415:
----------------------------------
Description:
*Context*
When LBHttpSolrClient has been constructed using *base root urls*, and when a
slave goes down, and then back again, the client is unable to mark it as alive
again due to 404 error.
Logs below:
{code:java}
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "GET
/solr/select?q=%3A&rows=0&sort=docid+asc&distrib=false&wt=javabin&version=2
HTTP/1.1[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "User-Agent:
Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Host:
localhost:8984[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Connection:
Keep-Alive[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "HTTP/1.1 404
Not Found[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"Cache-Control: must-revalidate,no-cache,no-store[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"Content-Type: text/html;charset=iso-8859-1[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"Content-Length: 243[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<html>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<head>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<meta
http-equiv="Content-Type" content="text/html;charset=utf-8"/>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<title>Error
404 Not Found</title>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "</head>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"<body><h2>HTTP ERROR 404</h2>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<p>Problem
accessing /solr/select. Reason:[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<pre> Not
Found</pre></p>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "</body>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"</html>[\n]"{code}
*Analysis*
when using only *base root urls* in a LBHttpSolrClient we need to pass a
"*collection*" paramter when sending a request. It works fine except that in a
method
{code:java}
private void checkAZombieServer(ServerWrapper zombieServer){code}
it tries to query a solr without the collection parameter, to check if the
server is alive. This causes a html content (apparently dashboard) to be
returned, and as a result it will move to the exception clause in the method
therefore even if the server is back it will never be marked as alive again.
I debugged this and if we pass a collection name there as a second param it
will respond in a right manner.
Suggestion is either to somehow pass the collection name or to change the way
zombie servers are pinged.
*Steps to reproduce*
Run 2 servers - master and slave. Create client using base urls. Index, test
search etc.
Turn off slave server and after couple of seconds turn it on again.
was:
*Context*
When LBHttpSolrClient has been constructed using *base urls*, and when a slave
goes down, and then back again, the client is unable to mark it as alive again
due to 404 error.
Logs below:
{code:java}
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "GET
/solr/select?q=%3A&rows=0&sort=docid+asc&distrib=false&wt=javabin&version=2
HTTP/1.1[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "User-Agent:
Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Host:
localhost:8984[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Connection:
Keep-Alive[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "HTTP/1.1 404
Not Found[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"Cache-Control: must-revalidate,no-cache,no-store[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"Content-Type: text/html;charset=iso-8859-1[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"Content-Length: 243[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "[\r][\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<html>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<head>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<meta
http-equiv="Content-Type" content="text/html;charset=utf-8"/>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<title>Error
404 Not Found</title>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "</head>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"<body><h2>HTTP ERROR 404</h2>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<p>Problem
accessing /solr/select. Reason:[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<pre> Not
Found</pre></p>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "</body>[\n]"
DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
"</html>[\n]"{code}
*Analysis*
when using only *base urls* in a LBHttpSolrClient we need to pass a
"*collection*" paramter when sending a request. It works fine except that in a
method
{code:java}
private void checkAZombieServer(ServerWrapper zombieServer){code}
it tries to query a solr without the collection parameter, to check if the
server is alive. This causes a html content (apparently dashboard) to be
returned, and as a result it will move to the exception clause in the method
therefore even if the server is back it will never be marked as alive again.
I debugged this and if we pass a collection name there as a second param it
will respond in a right manner.
Suggestion is either to somehow pass the collection name or to change the way
zombie servers are pinged.
*Steps to reproduce*
Run 2 servers - master and slave. Create client using base urls. Index, test
search etc.
Turn off slave server and after couple of seconds turn it on again.
> Solr Loadbalancer client LBHttpSolrClient not working as expected, if a Solr
> node goes down, it is unable to detect when it become live again due to 404
> error
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-12415
> URL: https://issues.apache.org/jira/browse/SOLR-12415
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrJ
> Affects Versions: 7.2.1, 7.3.1, 7.4
> Environment: Solr 7.2.1
> 2 servers - master and slave.
> Reporter: Grzegorz Lebek
> Priority: Critical
>
> *Context*
> When LBHttpSolrClient has been constructed using *base root urls*, and when
> a slave goes down, and then back again, the client is unable to mark it as
> alive again due to 404 error.
> Logs below:
> {code:java}
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "GET
> /solr/select?q=%3A&rows=0&sort=docid+asc&distrib=false&wt=javabin&version=2
> HTTP/1.1[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >>
> "User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient]
> 1.0[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Host:
> localhost:8984[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >>
> "Connection: Keep-Alive[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "HTTP/1.1
> 404 Not Found[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
> "Cache-Control: must-revalidate,no-cache,no-store[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
> "Content-Type: text/html;charset=iso-8859-1[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
> "Content-Length: 243[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "[\r][\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<html>[\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<head>[\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<meta
> http-equiv="Content-Type" content="text/html;charset=utf-8"/>[\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
> "<title>Error 404 Not Found</title>[\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
> "</head>[\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
> "<body><h2>HTTP ERROR 404</h2>[\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<p>Problem
> accessing /solr/select. Reason:[\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<pre> Not
> Found</pre></p>[\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
> "</body>[\n]"
> DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 <<
> "</html>[\n]"{code}
> *Analysis*
> when using only *base root urls* in a LBHttpSolrClient we need to pass a
> "*collection*" paramter when sending a request. It works fine except that in
> a method
> {code:java}
> private void checkAZombieServer(ServerWrapper zombieServer){code}
> it tries to query a solr without the collection parameter, to check if the
> server is alive. This causes a html content (apparently dashboard) to be
> returned, and as a result it will move to the exception clause in the method
> therefore even if the server is back it will never be marked as alive again.
> I debugged this and if we pass a collection name there as a second param it
> will respond in a right manner.
> Suggestion is either to somehow pass the collection name or to change the way
> zombie servers are pinged.
> *Steps to reproduce*
> Run 2 servers - master and slave. Create client using base urls. Index, test
> search etc.
> Turn off slave server and after couple of seconds turn it on again.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]