What happens if the add cmd failes because of an unlikely network error? elSchrom <[email protected]> schrieb:
>Hi Diez, > >On 14 Okt., 11:39, Dieter Schmidt <[email protected]> wrote: >> For me it sounds like a configuration problem on the webservers or an >> availability/accessability issue. >> If for example all machines are accessable the locking key resides on >> maschine x. If one of the servers webservers differers in cfg it can happen >> that the key is added a second time as new somewhere else in the continuum. >> As result you will have a second insert into your db. >> >> What do you think? Possible? > > >Possible for sure, but this should produce more problems like massive >redundant cached items, because some clients have a different type of >continuum. This is most likely not happening. The current failure rate >is smaller 0,0001% and they appear on different frontend-servers. It >feels like a very unlikely thing is happening here due to a massive >number of used add(), with a very rare number of failures. > >> >> elSchrom <[email protected]> schrieb: >> >> >> >> >On 14 Okt., 10:00, dormando <[email protected]> wrote: >> >> > our 50+ consistent hashing cluster is very reliable on normal >> >> > operations, incr/decr, get, set, multiget, etc. is not a problem. If >> >> > we have a problem with keys on wrong servers in the continuum, we >> >> > should have more problems, which we currently have not. >> >> > The cluster is always under relatively high load (the number of >> >> > connections for example is very high due to 160+ webservers in the >> >> > front). We are now expecting in a very few cases, that this >> >> > locking mechanism does not work. Two different clients try to lock the >> >> > with the same object (if you want to prevent multiple inserts in a >> >> > database on the same >> >> > primary key you have to explicitly set one key valid for all clients >> >> > and not a key with unique hashes in it), it works millions of times as >> >> > expected (we are generating a large number of user triggered database >> >> > inserts (~60/sec.) >> >> > with this construct). But a handful of locks does not work and shows >> >> > the behaviour described. So now my question is again: is it thinkable >> >> > (even if it is very implausible), that >> >> > a multithreaded memd does not provide 100% sure atomic add()? >> >> >> restart memcached with -t 1 and see if it stops happening. I already said >> >> it's not possible. >> >> >Yeah, right. :-) Restarting all memd instances is not an option. Can >> >you explain, why it is not possible?
