Peter Horvath created SOLR-9129:
-----------------------------------

             Summary: Solr Cloud hangs when creating large number of 
collections and node fails to recover after restart
                 Key: SOLR-9129
                 URL: https://issues.apache.org/jira/browse/SOLR-9129
             Project: Solr
          Issue Type: Bug
          Components: Server
    Affects Versions: 6.0
         Environment: OS: GNU Linux, kernel 4.4.0-22 on x86_64 (Ubuntu Linux 
16.04 LTS (64-bit))
RAM: 16 GB
CPU: Intel Core i7-4720HQ CPU @ 2.60GHz × 8
Java version: Oracle JDK 1.8.0_92 (x64) build 1.8.0_92-b14 Java HotSpot(TM) 
64-Bit Server VM (build 25.92-b14, mixed mode)
            Reporter: Peter Horvath


I attempted to benchmark SolrCloud to see how well it would work with some 
sample data set of ours. 
I wanted to create about 2500 empty collections first to see how that would 
scale.

Unfortunately, the test was not successful. Solr started failing after creating 
around 2000 collections and the cluster has failed to recover after a complete 
restart, which is quite concerning to me. 

I based my environment on the cloud example (I use the same config set as the 
gettingstarted example collection etc); so I have the vanilla install and used 
the following commands to bring the nodes online:

.../solr/6.0.0/bin/solr start -m 2g -cloud -p 8983 -s
".../solr/6.0.0/example/cloud/node1/solr"
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 7574 -s
".../solr/6.0.0/example/cloud/node2/solr" -z localhost:9983
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 8984 -s
".../solr/6.0.0/example/cloud/node3/solr" -z localhost:9983
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 7575 -s
".../solr/6.0.0/example/cloud/node4/solr" -z localhost:9983

After about 2000 collections were created, SolR got hung; REST requests started 
failing. I found the following entry in the logs, wihch I could relate to the 
failed REST request. For further logs, please see the attachment of this issue. 

null:org.apache.solr.common.SolrException: Could not fully create collection: 
FOOBAR
        at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:266)
        at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:197)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
        at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:658)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:441)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
        at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
        at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
        at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
        at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at org.eclipse.jetty.server.Server.handle(Server.java:518)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
        at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
        at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
        at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
        at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
        at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
        at java.lang.Thread.run(Thread.java:745)

For further logs, please see the attachment of this issue. 

After the Solr instance affected has failed to recover, I decided to restart 
the whole cluster (using the official solr stop-start commands). Unfortunately, 
after this, at least one node remained spinning in ZooKeeper logic, creating 
more than four thousand (!!) threads.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to