Re: How does Couchbase clucter response when one nodes down???

Phuc Huu Tue, 06 May 2014 03:26:25 -0700

Why when i stop service on a server in cluster, i can not connect to remain 
servers in cluster as log error show above. Can you help me explain this. 
Thanks so much.


On Sunday, 4 May 2014 01:36:41 UTC+7, Phuc Huu wrote:
>
> Hi Matt,
>
> I give more information:
>
> 1. There’s a lot of missing information here, like version of the cluster, 
> what client you’re using, what the workload is:
>
> Cluster: Couchbase 2.5
> Client: Spymemcached 2.8.4
>
> This tool create 200 threads to connect to Couchbase cluster. Each thread 
> Set a key and Get this key to check immediately, if success it continues 
> Set/Get another key. If fail, it retry Set/Get and by pass this key if fail 
> in 5 times.
>
> I see the cluster drop throughput from Couchbase Web Console 
> http://ip:8091/
>
> 2. I rewrite a loop as example but it failed too. In normal, i can have 
> 300-400 ops but when a server down, it only serve 20-30 ops.
>
> My code:
>         try {
>             MemcachedClient c = new MemcachedClient(
>                     new BinaryConnectionFactory(),
>                     AddrUtil.getAddresses("10.0.0.20:11234 10.0.0.23:11234 
> 10.0.0.24:11234"));
>
>             for (int i = 0; i < 3000; i++) {
>                 String ini_key = "test_key";
>                 String key = ini_key + i;
>                 Future<Object> f = null;
>                 try {
>                     c.set(key, 0, value);
>                     f = c.asyncGet(key);
>
>                     Object result = f.get(5, TimeUnit.SECONDS);
>                     boolean check = f.isDone();
>                    
>                     if (check) {
>                         System.out.println(key + " " + check);
>                     }
>                 } catch (Exception e) {
>                     e.printStackTrace();
>                     f.cancel(false);
>                 }
>
>             }
>
>         } catch (Exception ex) {
>             ex.printStackTrace();
>         }
>
> This is log output (in this case, i stop Couchbase service on server 
> 10.0.0.28, i think the connection has problem at this server but it show 
> connection error at all server in cluster):
>
> *2014-05-03 23:40:42.241 ERROR 
> net.spy.memcached.protocol.binary.StoreOperationImpl:  Error:  Internal 
> error*
> *2014-05-03 23:40:42.242 INFO net.spy.memcached.MemcachedConnection: 
>  Reconnection due to exception handling a memcached operation on {QA 
> sa=/10.0.0.24:11234 <http://10.0.0.24:11234>, #Rops=2, #Wops=0, #iq=0, 
> topRop=Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 0 Exp: 0 Flags: 0 Data 
> Length: 804, topWop=null, toWrite=0, interested=1}. This may be due to an 
> authentication failure.*
> *OperationException: SERVER: Internal error*
> *    at 
> net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:192)*
> *    at 
> net.spy.memcached.protocol.binary.OperationImpl.getStatusForErrorCode(OperationImpl.java:244)*
> *    at 
> net.spy.memcached.protocol.binary.OperationImpl.finishedPayload(OperationImpl.java:201)*
> *    at 
> net.spy.memcached.protocol.binary.OperationImpl.readPayloadFromBuffer(OperationImpl.java:196)*
> *    at 
> net.spy.memcached.protocol.binary.OperationImpl.readFromBuffer(OperationImpl.java:139)*
> *    at 
> net.spy.memcached.MemcachedConnection.readBufferAndLogMetrics(MemcachedConnection.java:825)*
> *    at 
> net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:804)*
> *    at 
> net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:684)*
> *    at 
> net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:647)*
> *    at 
> net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:418)*
> *    at 
> net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1400)*
> *2014-05-03 23:40:42.242 WARN net.spy.memcached.MemcachedConnection: 
>  Closing, and reopening {QA sa=/10.0.0.24:11234 <http://10.0.0.24:11234>, 
> #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 
> 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, toWrite=0, interested=1}, 
> attempt 0.*
> *2014-05-03 23:40:42.242 WARN 
> net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  Discarding 
> partially completed op: Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 0 Exp: 
> 0 Flags: 0 Data Length: 804*
> *2014-05-03 23:40:42.242 WARN 
> net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  Discarding 
> partially completed op: Cmd: 0 Opaque: 10958 Key: test_key5478*
> *java.util.concurrent.ExecutionException: 
> java.util.concurrent.CancellationException: Cancelled*
> *    at 
> net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:177)*
> *    at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69)*
> *    at toolcb.go(toolcb.java:45)*
> *    at toolcb.main(toolcb.java:14)*
> *Caused by: java.util.concurrent.CancellationException: Cancelled*
> *    ... 4 more*
> *test_key5479 true*
> *test_key5480 true*
> *test_key5481 true*
> *test_key5482 true*
> *test_key5483 true*
> *test_key5484 true*
> *test_key5485 true*
> *test_key5486 true*
> *test_key5487 true*
> *2014-05-03 23:40:42.318 ERROR 
> net.spy.memcached.protocol.binary.StoreOperationImpl:  Error:  Internal 
> error*
> *2014-05-03 23:40:42.319 INFO net.spy.memcached.MemcachedConnection: 
>  Reconnection due to exception handling a memcached operation on {QA 
> sa=/10.0.0.20:11234 <http://10.0.0.20:11234>, #Rops=2, #Wops=0, #iq=0, 
> topRop=Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 0 Exp: 0 Flags: 0 Data 
> Length: 804, topWop=null, toWrite=0, interested=1}. This may be due to an 
> authentication failure.*
> *OperationException: SERVER: Internal error*
> *    at 
> net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:192)*
> *    at 
> net.spy.memcached.protocol.binary.OperationImpl.getStatusForErrorCode(OperationImpl.java:244)*
> *    at 
> net.spy.memcached.protocol.binary.OperationImpl.finishedPayload(OperationImpl.java:201)*
> *    at 
> net.spy.memcached.protocol.binary.OperationImpl.readPayloadFromBuffer(OperationImpl.java:196)*
> *    at 
> net.spy.memcached.protocol.binary.OperationImpl.readFromBuffer(OperationImpl.java:139)*
> *    at 
> net.spy.memcached.MemcachedConnection.readBufferAndLogMetrics(MemcachedConnection.java:825)*
> *    at 
> net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:804)*
> *    at 
> net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:684)*
> *    at 
> net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:647)*
> *    at 
> net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:418)*
> *    at 
> net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1400)*
> *2014-05-03 23:40:42.320 WARN net.spy.memcached.MemcachedConnection: 
>  Closing, and reopening {QA sa=/10.0.0.20:11234 <http://10.0.0.20:11234>, 
> #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 
> 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, toWrite=0, interested=1}, 
> attempt 0.*
> *2014-05-03 23:40:42.320 WARN 
> net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  Discarding 
> partially completed op: Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 0 Exp: 
> 0 Flags: 0 Data Length: 804*
> *2014-05-03 23:40:42.320 WARN 
> net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  Discarding 
> partially completed op: Cmd: 0 Opaque: 10978 Key: test_key5488*
> *java.util.concurrent.ExecutionException: 
> java.util.concurrent.CancellationException: Cancelled*
> *    at 
> net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:177)*
> *    at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69)*
> *    at toolcb.go(toolcb.java:45)*
> *    at toolcb.main(toolcb.java:14)*
> *Caused by: java.util.concurrent.CancellationException: Cancelled*
> *    ... 4 more*
> *test_key5489 true*
>
> On Friday, 2 May 2014 23:41:57 UTC+7, Matt Ingenthron wrote:
>>
>>  Hi Phuc,
>>
>>   From: Phuc Huu <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Friday, May 2, 2014 at 3:03 AM
>> To: "[email protected]" <[email protected]>
>> Subject: How does Couchbase clucter response when one nodes down???
>>  
>>   I'm testing Couchbase Server 2.5`. I have a cluster with 7 nodes and 3 
>> replicates. In normal condition, the system works fine.
>>
>> But I failed with this test case: Couchbase cluster's serving 40.000 ops 
>> and I stop couchbase service on one server => one node down. After that, 
>> entire cluster's performance is decreased painfully. It only can server 
>> below 1.000 ops. When I click fail-over then entire cluster return healthy.
>>  
>>  
>>  There’s a lot of missing information here, like version of the cluster, 
>> what client you’re using, what the workload is.
>>
>>  Based on what you say, I suspect your test is just running in a tight 
>> loop with random keys.  Before the failover, this means it will try to open 
>> the connection and it will wait some time to try to get a response from 
>> that node.  The configuration is telling the client that the node should be 
>> part of the cluster.  That additional latency inserted into the tight loop 
>> would give you the drop in throughput.
>>
>>  Consider refactoring your test so the workload generation is constant 
>> and you should see a drop in throughput commensurate with the small 
>> reduction in nodes.
>>
>>  Or, if you want to more simply test this theory, just pick a few random 
>> keys and hit each in their own tight loop.  When you stop the service on 
>> the one node, those should maintain the same throughput.
>>
>>    
>> Is this right behavior that Couchbase cluster response when one nodes 
>> down??? Couchbase cluster will lose nearly all performance until i 
>> fail-over.
>>  
>>  
>>  I’m highly confident that Couchbase works correctly here.  I think 
>> you’re seeing the drop in throughput from your workload generator hitting 
>> timeouts.
>>
>>  Hope that helps,
>>
>>  Matt
>>  
>>  -- 
>>  Matt Ingenthron
>> Couchbase, Inc.
>>    
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: How does Couchbase clucter response when one nodes down???

Reply via email to