Hi Matt, I give more information:
1. There’s a lot of missing information here, like version of the cluster, what client you’re using, what the workload is: Cluster: Couchbase 2.5 Client: Spymemcached 2.8.4 This tool create 200 threads to connect to Couchbase cluster. Each thread Set a key and Get this key to check immediately, if success it continues Set/Get another key. If fail, it retry Set/Get and by pass this key if fail in 5 times. I see the cluster drop throughput from Couchbase Web Console http://ip:8091/ 2. I rewrite a loop as example but it failed too. In normal, i can have 300-400 ops but when a server down, it only serve 20-30 ops. My code: try { MemcachedClient c = new MemcachedClient( new BinaryConnectionFactory(), AddrUtil.getAddresses("10.0.0.20:11234 10.0.0.23:11234 10.0.0.24:11234")); for (int i = 0; i < 3000; i++) { String ini_key = "test_key"; String key = ini_key + i; Future<Object> f = null; try { c.set(key, 0, value); f = c.asyncGet(key); Object result = f.get(5, TimeUnit.SECONDS); boolean check = f.isDone(); if (check) { System.out.println(key + " " + check); } } catch (Exception e) { e.printStackTrace(); f.cancel(false); } } } catch (Exception ex) { ex.printStackTrace(); } This is log output (in this case, i stop Couchbase service on server 10.0.0.28, i think the connection has problem at this server but it show connection error at all server in cluster): 2014-05-03 23:40:42.241 ERROR net.spy.memcached.protocol.binary.StoreOperationImpl: Error: Internal error 2014-05-03 23:40:42.242 INFO net.spy.memcached.MemcachedConnection: Reconnection due to exception handling a memcached operation on {QA sa=/10.0.0.24:11234, #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, toWrite=0, interested=1}. This may be due to an authentication failure. OperationException: SERVER: Internal error at net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:192) at net.spy.memcached.protocol.binary.OperationImpl.getStatusForErrorCode(OperationImpl.java:244) at net.spy.memcached.protocol.binary.OperationImpl.finishedPayload(OperationImpl.java:201) at net.spy.memcached.protocol.binary.OperationImpl.readPayloadFromBuffer(OperationImpl.java:196) at net.spy.memcached.protocol.binary.OperationImpl.readFromBuffer(OperationImpl.java:139) at net.spy.memcached.MemcachedConnection.readBufferAndLogMetrics(MemcachedConnection.java:825) at net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:804) at net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:684) at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:647) at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:418) at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1400) 2014-05-03 23:40:42.242 WARN net.spy.memcached.MemcachedConnection: Closing, and reopening {QA sa=/10.0.0.24:11234, #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, toWrite=0, interested=1}, attempt 0. 2014-05-03 23:40:42.242 WARN net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Discarding partially completed op: Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 0 Exp: 0 Flags: 0 Data Length: 804 2014-05-03 23:40:42.242 WARN net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Discarding partially completed op: Cmd: 0 Opaque: 10958 Key: test_key5478 java.util.concurrent.ExecutionException: java.util.concurrent.CancellationException: Cancelled at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:177) at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69) at toolcb.go(toolcb.java:45) at toolcb.main(toolcb.java:14) Caused by: java.util.concurrent.CancellationException: Cancelled ... 4 more test_key5479 true test_key5480 true test_key5481 true test_key5482 true test_key5483 true test_key5484 true test_key5485 true test_key5486 true test_key5487 true 2014-05-03 23:40:42.318 ERROR net.spy.memcached.protocol.binary.StoreOperationImpl: Error: Internal error 2014-05-03 23:40:42.319 INFO net.spy.memcached.MemcachedConnection: Reconnection due to exception handling a memcached operation on {QA sa=/10.0.0.20:11234, #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, toWrite=0, interested=1}. This may be due to an authentication failure. OperationException: SERVER: Internal error at net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:192) at net.spy.memcached.protocol.binary.OperationImpl.getStatusForErrorCode(OperationImpl.java:244) at net.spy.memcached.protocol.binary.OperationImpl.finishedPayload(OperationImpl.java:201) at net.spy.memcached.protocol.binary.OperationImpl.readPayloadFromBuffer(OperationImpl.java:196) at net.spy.memcached.protocol.binary.OperationImpl.readFromBuffer(OperationImpl.java:139) at net.spy.memcached.MemcachedConnection.readBufferAndLogMetrics(MemcachedConnection.java:825) at net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:804) at net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:684) at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:647) at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:418) at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1400) 2014-05-03 23:40:42.320 WARN net.spy.memcached.MemcachedConnection: Closing, and reopening {QA sa=/10.0.0.20:11234, #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, toWrite=0, interested=1}, attempt 0. 2014-05-03 23:40:42.320 WARN net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Discarding partially completed op: Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 0 Exp: 0 Flags: 0 Data Length: 804 2014-05-03 23:40:42.320 WARN net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Discarding partially completed op: Cmd: 0 Opaque: 10978 Key: test_key5488 java.util.concurrent.ExecutionException: java.util.concurrent.CancellationException: Cancelled at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:177) at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69) at toolcb.go(toolcb.java:45) at toolcb.main(toolcb.java:14) Caused by: java.util.concurrent.CancellationException: Cancelled ... 4 more test_key5489 true On Friday, 2 May 2014 17:03:27 UTC+7, Phuc Huu wrote: > > I'm testing Couchbase Server 2.5`. I have a cluster with 7 nodes and 3 > replicates. In normal condition, the system works fine. > > But I failed with this test case: Couchbase cluster's serving 40.000 ops > and I stop couchbase service on one server => one node down. After that, > entire cluster's performance is decreased painfully. It only can server > below 1.000 ops. When I click fail-over then entire cluster return healthy. > > Is this right behavior that Couchbase cluster response when one nodes > down??? Couchbase cluster will lose nearly all performance until i > fail-over. > -- You received this message because you are subscribed to the Google Groups "Couchbase" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
