What happens if the add cmd failes because of an unlikely network error? 

elSchrom <[email protected]> schrieb:

>Hi Diez,
>
>On 14 Okt., 11:39, Dieter Schmidt <[email protected]> wrote:
>> For me it sounds like a configuration problem on the webservers or an 
>> availability/accessability issue.
>> If for example all machines are accessable the locking key resides on 
>> maschine x. If one of the servers webservers differers in cfg it can happen 
>> that the key is added a second time as new somewhere else in the continuum. 
>> As result you will have a second insert into your db.
>>
>> What do you think? Possible?
>
>
>Possible for sure, but this should produce more problems like massive
>redundant cached items, because some clients have a different type of
>continuum. This is most likely not happening. The current failure rate
>is smaller 0,0001% and they appear on different frontend-servers. It
>feels like a very unlikely thing is happening here due to a massive
>number of used add(), with a very rare number of failures.
>
>>
>> elSchrom <[email protected]> schrieb:
>>
>>
>>
>> >On 14 Okt., 10:00, dormando <[email protected]> wrote:
>> >> > our 50+ consistent hashing cluster is very reliable on normal
>> >> > operations, incr/decr, get, set, multiget, etc. is not a problem. If
>> >> > we have a problem with keys on wrong servers in the continuum, we
>> >> > should have more problems, which we currently have not.
>> >> > The cluster is always under relatively high load (the number of
>> >> > connections for example is very high due to 160+ webservers in the
>> >> > front). We are now expecting in a very few cases, that this
>> >> > locking mechanism does not work. Two different clients try to lock the
>> >> > with the same object (if you want to prevent multiple inserts in a
>> >> > database on the same
>> >> > primary key you have to explicitly set one key valid for all clients
>> >> > and not a key with unique hashes in it), it works millions of times as
>> >> > expected (we are generating a large number of user triggered database
>> >> > inserts (~60/sec.)
>> >> > with this construct). But a handful of locks does not work and shows
>> >> > the behaviour described. So now my question is again: is it thinkable
>> >> > (even if it is very implausible), that
>> >> > a multithreaded memd does not provide 100% sure atomic add()?
>>
>> >> restart memcached with -t 1 and see if it stops happening. I already said
>> >> it's not possible.
>>
>> >Yeah, right. :-) Restarting all memd instances is not an option. Can
>> >you explain, why it is not possible?

Reply via email to