Why when i stop service on a server in cluster, i can not connect to remain servers in cluster as log error show above. Can you help me explain this. Thanks so much.
On Sunday, 4 May 2014 01:36:41 UTC+7, Phuc Huu wrote: > > Hi Matt, > > I give more information: > > 1. There’s a lot of missing information here, like version of the cluster, > what client you’re using, what the workload is: > > Cluster: Couchbase 2.5 > Client: Spymemcached 2.8.4 > > This tool create 200 threads to connect to Couchbase cluster. Each thread > Set a key and Get this key to check immediately, if success it continues > Set/Get another key. If fail, it retry Set/Get and by pass this key if fail > in 5 times. > > I see the cluster drop throughput from Couchbase Web Console > http://ip:8091/ > > 2. I rewrite a loop as example but it failed too. In normal, i can have > 300-400 ops but when a server down, it only serve 20-30 ops. > > My code: > try { > MemcachedClient c = new MemcachedClient( > new BinaryConnectionFactory(), > AddrUtil.getAddresses("10.0.0.20:11234 10.0.0.23:11234 > 10.0.0.24:11234")); > > for (int i = 0; i < 3000; i++) { > String ini_key = "test_key"; > String key = ini_key + i; > Future<Object> f = null; > try { > c.set(key, 0, value); > f = c.asyncGet(key); > > Object result = f.get(5, TimeUnit.SECONDS); > boolean check = f.isDone(); > > if (check) { > System.out.println(key + " " + check); > } > } catch (Exception e) { > e.printStackTrace(); > f.cancel(false); > } > > } > > } catch (Exception ex) { > ex.printStackTrace(); > } > > This is log output (in this case, i stop Couchbase service on server > 10.0.0.28, i think the connection has problem at this server but it show > connection error at all server in cluster): > > *2014-05-03 23:40:42.241 ERROR > net.spy.memcached.protocol.binary.StoreOperationImpl: Error: Internal > error* > *2014-05-03 23:40:42.242 INFO net.spy.memcached.MemcachedConnection: > Reconnection due to exception handling a memcached operation on {QA > sa=/10.0.0.24:11234 <http://10.0.0.24:11234>, #Rops=2, #Wops=0, #iq=0, > topRop=Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 0 Exp: 0 Flags: 0 Data > Length: 804, topWop=null, toWrite=0, interested=1}. This may be due to an > authentication failure.* > *OperationException: SERVER: Internal error* > * at > net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:192)* > * at > net.spy.memcached.protocol.binary.OperationImpl.getStatusForErrorCode(OperationImpl.java:244)* > * at > net.spy.memcached.protocol.binary.OperationImpl.finishedPayload(OperationImpl.java:201)* > * at > net.spy.memcached.protocol.binary.OperationImpl.readPayloadFromBuffer(OperationImpl.java:196)* > * at > net.spy.memcached.protocol.binary.OperationImpl.readFromBuffer(OperationImpl.java:139)* > * at > net.spy.memcached.MemcachedConnection.readBufferAndLogMetrics(MemcachedConnection.java:825)* > * at > net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:804)* > * at > net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:684)* > * at > net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:647)* > * at > net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:418)* > * at > net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1400)* > *2014-05-03 23:40:42.242 WARN net.spy.memcached.MemcachedConnection: > Closing, and reopening {QA sa=/10.0.0.24:11234 <http://10.0.0.24:11234>, > #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: > 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, toWrite=0, interested=1}, > attempt 0.* > *2014-05-03 23:40:42.242 WARN > net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Discarding > partially completed op: Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 0 Exp: > 0 Flags: 0 Data Length: 804* > *2014-05-03 23:40:42.242 WARN > net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Discarding > partially completed op: Cmd: 0 Opaque: 10958 Key: test_key5478* > *java.util.concurrent.ExecutionException: > java.util.concurrent.CancellationException: Cancelled* > * at > net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:177)* > * at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69)* > * at toolcb.go(toolcb.java:45)* > * at toolcb.main(toolcb.java:14)* > *Caused by: java.util.concurrent.CancellationException: Cancelled* > * ... 4 more* > *test_key5479 true* > *test_key5480 true* > *test_key5481 true* > *test_key5482 true* > *test_key5483 true* > *test_key5484 true* > *test_key5485 true* > *test_key5486 true* > *test_key5487 true* > *2014-05-03 23:40:42.318 ERROR > net.spy.memcached.protocol.binary.StoreOperationImpl: Error: Internal > error* > *2014-05-03 23:40:42.319 INFO net.spy.memcached.MemcachedConnection: > Reconnection due to exception handling a memcached operation on {QA > sa=/10.0.0.20:11234 <http://10.0.0.20:11234>, #Rops=2, #Wops=0, #iq=0, > topRop=Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 0 Exp: 0 Flags: 0 Data > Length: 804, topWop=null, toWrite=0, interested=1}. This may be due to an > authentication failure.* > *OperationException: SERVER: Internal error* > * at > net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:192)* > * at > net.spy.memcached.protocol.binary.OperationImpl.getStatusForErrorCode(OperationImpl.java:244)* > * at > net.spy.memcached.protocol.binary.OperationImpl.finishedPayload(OperationImpl.java:201)* > * at > net.spy.memcached.protocol.binary.OperationImpl.readPayloadFromBuffer(OperationImpl.java:196)* > * at > net.spy.memcached.protocol.binary.OperationImpl.readFromBuffer(OperationImpl.java:139)* > * at > net.spy.memcached.MemcachedConnection.readBufferAndLogMetrics(MemcachedConnection.java:825)* > * at > net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:804)* > * at > net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:684)* > * at > net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:647)* > * at > net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:418)* > * at > net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1400)* > *2014-05-03 23:40:42.320 WARN net.spy.memcached.MemcachedConnection: > Closing, and reopening {QA sa=/10.0.0.20:11234 <http://10.0.0.20:11234>, > #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: > 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, toWrite=0, interested=1}, > attempt 0.* > *2014-05-03 23:40:42.320 WARN > net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Discarding > partially completed op: Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 0 Exp: > 0 Flags: 0 Data Length: 804* > *2014-05-03 23:40:42.320 WARN > net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Discarding > partially completed op: Cmd: 0 Opaque: 10978 Key: test_key5488* > *java.util.concurrent.ExecutionException: > java.util.concurrent.CancellationException: Cancelled* > * at > net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:177)* > * at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69)* > * at toolcb.go(toolcb.java:45)* > * at toolcb.main(toolcb.java:14)* > *Caused by: java.util.concurrent.CancellationException: Cancelled* > * ... 4 more* > *test_key5489 true* > > On Friday, 2 May 2014 23:41:57 UTC+7, Matt Ingenthron wrote: >> >> Hi Phuc, >> >> From: Phuc Huu <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Friday, May 2, 2014 at 3:03 AM >> To: "[email protected]" <[email protected]> >> Subject: How does Couchbase clucter response when one nodes down??? >> >> I'm testing Couchbase Server 2.5`. I have a cluster with 7 nodes and 3 >> replicates. In normal condition, the system works fine. >> >> But I failed with this test case: Couchbase cluster's serving 40.000 ops >> and I stop couchbase service on one server => one node down. After that, >> entire cluster's performance is decreased painfully. It only can server >> below 1.000 ops. When I click fail-over then entire cluster return healthy. >> >> >> There’s a lot of missing information here, like version of the cluster, >> what client you’re using, what the workload is. >> >> Based on what you say, I suspect your test is just running in a tight >> loop with random keys. Before the failover, this means it will try to open >> the connection and it will wait some time to try to get a response from >> that node. The configuration is telling the client that the node should be >> part of the cluster. That additional latency inserted into the tight loop >> would give you the drop in throughput. >> >> Consider refactoring your test so the workload generation is constant >> and you should see a drop in throughput commensurate with the small >> reduction in nodes. >> >> Or, if you want to more simply test this theory, just pick a few random >> keys and hit each in their own tight loop. When you stop the service on >> the one node, those should maintain the same throughput. >> >> >> Is this right behavior that Couchbase cluster response when one nodes >> down??? Couchbase cluster will lose nearly all performance until i >> fail-over. >> >> >> I’m highly confident that Couchbase works correctly here. I think >> you’re seeing the drop in throughput from your workload generator hitting >> timeouts. >> >> Hope that helps, >> >> Matt >> >> -- >> Matt Ingenthron >> Couchbase, Inc. >> > -- You received this message because you are subscribed to the Google Groups "Couchbase" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
