Peter Horvath created SOLR-9129:
-----------------------------------
Summary: Solr Cloud hangs when creating large number of
collections and node fails to recover after restart
Key: SOLR-9129
URL: https://issues.apache.org/jira/browse/SOLR-9129
Project: Solr
Issue Type: Bug
Components: Server
Affects Versions: 6.0
Environment: OS: GNU Linux, kernel 4.4.0-22 on x86_64 (Ubuntu Linux
16.04 LTS (64-bit))
RAM: 16 GB
CPU: Intel Core i7-4720HQ CPU @ 2.60GHz × 8
Java version: Oracle JDK 1.8.0_92 (x64) build 1.8.0_92-b14 Java HotSpot(TM)
64-Bit Server VM (build 25.92-b14, mixed mode)
Reporter: Peter Horvath
I attempted to benchmark SolrCloud to see how well it would work with some
sample data set of ours.
I wanted to create about 2500 empty collections first to see how that would
scale.
Unfortunately, the test was not successful. Solr started failing after creating
around 2000 collections and the cluster has failed to recover after a complete
restart, which is quite concerning to me.
I based my environment on the cloud example (I use the same config set as the
gettingstarted example collection etc); so I have the vanilla install and used
the following commands to bring the nodes online:
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 8983 -s
".../solr/6.0.0/example/cloud/node1/solr"
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 7574 -s
".../solr/6.0.0/example/cloud/node2/solr" -z localhost:9983
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 8984 -s
".../solr/6.0.0/example/cloud/node3/solr" -z localhost:9983
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 7575 -s
".../solr/6.0.0/example/cloud/node4/solr" -z localhost:9983
After about 2000 collections were created, SolR got hung; REST requests started
failing. I found the following entry in the logs, wihch I could relate to the
failed REST request. For further logs, please see the attachment of this issue.
null:org.apache.solr.common.SolrException: Could not fully create collection:
FOOBAR
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:266)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:197)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:441)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
For further logs, please see the attachment of this issue.
After the Solr instance affected has failed to recover, I decided to restart
the whole cluster (using the official solr stop-start commands). Unfortunately,
after this, at least one node remained spinning in ZooKeeper logic, creating
more than four thousand (!!) threads.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]