[ https://issues.apache.org/jira/browse/CASSANDRA-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254175#comment-15254175 ]
Andy Tolbert commented on CASSANDRA-11615: ------------------------------------------ Was digging into this with [~eduard.tudenhoefner], and I suspect this is being caused by [JAVA-1002|https://datastax-oss.atlassian.net/browse/JAVA-1002], which will be fixed in 3.0.1. I tried this out with a 100 node simulated cluster (not using stress in this case) a single threaded netty event loop group in the driver (to amplify the impact), and timed how long a session.prepare takes when the keyspace is set on the connection. It took 203ms with the fix for JAVA-1002, otherwise it takes a very very long time. I think this is the source of the issue, but we'll need to confirm again when we get a large cluster up again in the next week or so. {noformat} 3.0.0 - 100 nodes, no keyspace set on session - 206ms 42776 [main] INFO OneHundredNodeSimulation - Done Initing Cluster..Preparing Statement 42982 [main] INFO OneHundredNodeSimulation - Done Preparing Statement...making query 3.0.0 - 100 nodes, keyspace set on session - too long..ms 46276 [main] INFO OneHundredNodeSimulation - Done Initing Cluster..Preparing Statement 58429 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.1:9042-3, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) 70510 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.3:9042-1, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) 82609 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.4:9042-1, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) 94725 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.5:9042-1, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) 106818 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.6:9042-1, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) 118908 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.7:9042-1, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) 131008 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.8:9042-1, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) 143109 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.9:9042-1, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) 155207 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.10:9042-1, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) 167308 [cluster1-nio-worker-0] WARN com.datastax.driver.core.Connection - Timeout while setting keyspace on Connection[/127.0.1.11:9042-1, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) ... 3.0.1rc (has JAVA-1002 fix) - 100 nodes, keyspace set on session - 203ms 46000 [main] INFO OneHundredNodeSimulation - Done Initing Cluster..Preparing Statement 46203 [main] INFO OneHundredNodeSimulation - Done Preparing Statement...making query {noformat} > cassandra-stress blocks when connecting to a big cluster > -------------------------------------------------------- > > Key: CASSANDRA-11615 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11615 > Project: Cassandra > Issue Type: Bug > Components: Tools > Reporter: Eduard Tudenhoefner > Assignee: Eduard Tudenhoefner > Fix For: 3.0.x > > Attachments: 11615-3.0-2nd.patch, 11615-3.0.patch > > > I had a *100* node cluster and was running > {code} > cassandra-stress read n=100 no-warmup cl=LOCAL_QUORUM -rate 'threads=20' > 'limit=1000/s' > {code} > Based on the thread dump it looks like it's been blocked at > https://github.com/apache/cassandra/blob/cassandra-3.0/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java#L96 > {code} > "Thread-20" #245 prio=5 os_prio=0 tid=0x00007f3781822000 nid=0x46c4 waiting > for monitor entry [0x00007f36cc788000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96) > - waiting to lock <0x00000005c003d920> (a > java.util.concurrent.ConcurrentHashMap) > at > org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314) > at > org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77) > at > org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109) > at > org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261) > at > org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327) > "Thread-19" #244 prio=5 os_prio=0 tid=0x00007f3781820000 nid=0x46c3 waiting > for monitor entry [0x00007f36cc889000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.stress.util.JavaDriverClient.prepare(JavaDriverClient.java:96) > - waiting to lock <0x00000005c003d920> (a > java.util.concurrent.ConcurrentHashMap) > at > org.apache.cassandra.stress.operations.predefined.CqlOperation$JavaDriverWrapper.createPreparedStatement(CqlOperation.java:314) > at > org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:77) > at > org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:109) > at > org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:261) > at > org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:327) > {code} > I was trying the same with with a smaller cluster (50 nodes) and it was > working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)