Ok I understood the first issue (it might help someone else)

In the case the first node goes down, then the first get will first
send its request through the network. At that point, everything is ok
from the client point of view. The socket is valid, the request is
sent, so it doesn't look into the replica. After that, it tries to
fetch the answer. It calls ::recv on the socket and goes in timeout,
since the server does not answer. As this happens, the server is
marked as timeout, the socket as invalid and that's all. It doesn't
ask the replica. The replica can only be queried in the first part,
when it tries to ::send on the socket on an invalidated socket.
That's why the second get will target the replica.

But I still don't get why when the server is up again, it does not
receive any query. The only solution I found is to reload the
configuration (which means cleaning the memcached_st), but that's not
good..I ll play with gdb to understand this part too.

Cheers
Nico

On 6 juin, 19:05, Nicolas Motte <[email protected]> wrote:
> Hi,
>
> I went through all the libmemcached code (v0.51).
> I wrote down a simple test to check I understood correctly the
> behavior of the lib.
>
> Here is my test case :
>
> I start 3 memcached nodes. The settings are : binary protocol, ketama,
> connect_timeout=100, retry_timeout=1, number_of_replica=1, no_block=0
> I try to get an item on this empty farm. It tries the first node, the
> replica, and return false -> ok
> Then, I saw in the code that in case of a set, we set the key to the
> first node, and then we buffer the set to the replica. So I call set
> enough to flush this buffer. At that point, I have
>
> first node -> 368 keys
> second node -> 364 keys
> third node -> 0 keys
>
> My keys are "Key1" and "Key2", always stored with the same value
> I try to get my item, it finds it in the first node -> ok
> Then I kill the first node, and try to get Key1. As NO_BLOCK is set to
> 0, it first flushes the remaining call in the buffer, which means we
> now have 368 keys in the second node.
> At that point, I expect my get to return the correct value (it should
> try on the first node, see that the connection is down, put it in
> timeout state, and try on the replica).
> But in practice, it returns false, and the second node didn t receive
> any query.
> If I call get a second time on this same key, this time it finds it in
> the replica....I really don t get why...
>
> Then I start again the first node, auto_ejecct is set to false, so I m
> sure it has not been removed from the ketama continuum. At that point,
> if I try to get the key, it should try to find it on the first node
> (and fail since this node is empty). But in fact, it directly goes to
> the second node, the first node doesn t receive queries anymore even
> if I send a lot of them...I don t understand why neither.
>
> I hope someone will be able to help me..
> Cheers
> Nico

Reply via email to