Re: How does Couchbase clucter response when one nodes down???

Phuc Huu Sat, 03 May 2014 11:33:07 -0700

Hi Matt,

I give more information:


1. There’s a lot of missing information here, like version of the cluster, 
what client you’re using, what the workload is:

Cluster: Couchbase 2.5
Client: Spymemcached 2.8.4

This tool create 200 threads to connect to Couchbase cluster. Each thread 
Set a key and Get this key to check immediately, if success it continues 
Set/Get another key. If fail, it retry Set/Get and by pass this key if fail 
in 5 times.

I see the cluster drop throughput from Couchbase Web Console http://ip:8091/

2. I rewrite a loop as example but it failed too. In normal, i can have 
300-400 ops but when a server down, it only serve 20-30 ops.

My code:
        try {
            MemcachedClient c = new MemcachedClient(
                    new BinaryConnectionFactory(),
                    AddrUtil.getAddresses("10.0.0.20:11234 10.0.0.23:11234 
10.0.0.24:11234"));

            for (int i = 0; i < 3000; i++) {
                String ini_key = "test_key";
                String key = ini_key + i;
                Future<Object> f = null;
                try {
                    c.set(key, 0, value);
                    f = c.asyncGet(key);

                    Object result = f.get(5, TimeUnit.SECONDS);
                    boolean check = f.isDone();
                   
                    if (check) {
                        System.out.println(key + " " + check);
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                    f.cancel(false);
                }

            }

        } catch (Exception ex) {
            ex.printStackTrace();
        }

This is log output (in this case, i stop Couchbase service on server 
10.0.0.28, i think the connection has problem at this server but it show 
connection error at all server in cluster):

2014-05-03 23:40:42.241 ERROR 
net.spy.memcached.protocol.binary.StoreOperationImpl:  Error:  Internal 
error
2014-05-03 23:40:42.242 INFO net.spy.memcached.MemcachedConnection:  
Reconnection due to exception handling a memcached operation on {QA 
sa=/10.0.0.24:11234, #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10957 
Key: test_key5478 Cas: 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, 
toWrite=0, interested=1}. This may be due to an authentication failure.
OperationException: SERVER: Internal error
    at 
net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:192)
    at 
net.spy.memcached.protocol.binary.OperationImpl.getStatusForErrorCode(OperationImpl.java:244)
    at 
net.spy.memcached.protocol.binary.OperationImpl.finishedPayload(OperationImpl.java:201)
    at 
net.spy.memcached.protocol.binary.OperationImpl.readPayloadFromBuffer(OperationImpl.java:196)
    at 
net.spy.memcached.protocol.binary.OperationImpl.readFromBuffer(OperationImpl.java:139)
    at 
net.spy.memcached.MemcachedConnection.readBufferAndLogMetrics(MemcachedConnection.java:825)
    at 
net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:804)
    at 
net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:684)
    at 
net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:647)
    at 
net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:418)
    at 
net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1400)
2014-05-03 23:40:42.242 WARN net.spy.memcached.MemcachedConnection:  
Closing, and reopening {QA sa=/10.0.0.24:11234, #Rops=2, #Wops=0, #iq=0, 
topRop=Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 0 Exp: 0 Flags: 0 Data 
Length: 804, topWop=null, toWrite=0, interested=1}, attempt 0.
2014-05-03 23:40:42.242 WARN 
net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  Discarding 
partially completed op: Cmd: 1 Opaque: 10957 Key: test_key5478 Cas: 0 Exp: 
0 Flags: 0 Data Length: 804
2014-05-03 23:40:42.242 WARN 
net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  Discarding 
partially completed op: Cmd: 0 Opaque: 10958 Key: test_key5478
java.util.concurrent.ExecutionException: 
java.util.concurrent.CancellationException: Cancelled
    at 
net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:177)
    at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69)
    at toolcb.go(toolcb.java:45)
    at toolcb.main(toolcb.java:14)
Caused by: java.util.concurrent.CancellationException: Cancelled
    ... 4 more
test_key5479 true
test_key5480 true
test_key5481 true
test_key5482 true
test_key5483 true
test_key5484 true
test_key5485 true
test_key5486 true
test_key5487 true
2014-05-03 23:40:42.318 ERROR 
net.spy.memcached.protocol.binary.StoreOperationImpl:  Error:  Internal 
error
2014-05-03 23:40:42.319 INFO net.spy.memcached.MemcachedConnection:  
Reconnection due to exception handling a memcached operation on {QA 
sa=/10.0.0.20:11234, #Rops=2, #Wops=0, #iq=0, topRop=Cmd: 1 Opaque: 10977 
Key: test_key5488 Cas: 0 Exp: 0 Flags: 0 Data Length: 804, topWop=null, 
toWrite=0, interested=1}. This may be due to an authentication failure.
OperationException: SERVER: Internal error
    at 
net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:192)
    at 
net.spy.memcached.protocol.binary.OperationImpl.getStatusForErrorCode(OperationImpl.java:244)
    at 
net.spy.memcached.protocol.binary.OperationImpl.finishedPayload(OperationImpl.java:201)
    at 
net.spy.memcached.protocol.binary.OperationImpl.readPayloadFromBuffer(OperationImpl.java:196)
    at 
net.spy.memcached.protocol.binary.OperationImpl.readFromBuffer(OperationImpl.java:139)
    at 
net.spy.memcached.MemcachedConnection.readBufferAndLogMetrics(MemcachedConnection.java:825)
    at 
net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:804)
    at 
net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:684)
    at 
net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:647)
    at 
net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:418)
    at 
net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:1400)
2014-05-03 23:40:42.320 WARN net.spy.memcached.MemcachedConnection:  
Closing, and reopening {QA sa=/10.0.0.20:11234, #Rops=2, #Wops=0, #iq=0, 
topRop=Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 0 Exp: 0 Flags: 0 Data 
Length: 804, topWop=null, toWrite=0, interested=1}, attempt 0.
2014-05-03 23:40:42.320 WARN 
net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  Discarding 
partially completed op: Cmd: 1 Opaque: 10977 Key: test_key5488 Cas: 0 Exp: 
0 Flags: 0 Data Length: 804
2014-05-03 23:40:42.320 WARN 
net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl:  Discarding 
partially completed op: Cmd: 0 Opaque: 10978 Key: test_key5488
java.util.concurrent.ExecutionException: 
java.util.concurrent.CancellationException: Cancelled
    at 
net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:177)
    at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69)
    at toolcb.go(toolcb.java:45)
    at toolcb.main(toolcb.java:14)
Caused by: java.util.concurrent.CancellationException: Cancelled
    ... 4 more
test_key5489 true

On Friday, 2 May 2014 17:03:27 UTC+7, Phuc Huu wrote:
>
> I'm testing Couchbase Server 2.5`. I have a cluster with 7 nodes and 3 
> replicates. In normal condition, the system works fine.
>
> But I failed with this test case: Couchbase cluster's serving 40.000 ops 
> and I stop couchbase service on one server => one node down. After that, 
> entire cluster's performance is decreased painfully. It only can server 
> below 1.000 ops. When I click fail-over then entire cluster return healthy.
>
> Is this right behavior that Couchbase cluster response when one nodes 
> down??? Couchbase cluster will lose nearly all performance until i 
> fail-over.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: How does Couchbase clucter response when one nodes down???

Reply via email to