Hi Diez, On 14 Okt., 11:39, Dieter Schmidt <[email protected]> wrote: > For me it sounds like a configuration problem on the webservers or an > availability/accessability issue. > If for example all machines are accessable the locking key resides on > maschine x. If one of the servers webservers differers in cfg it can happen > that the key is added a second time as new somewhere else in the continuum. > As result you will have a second insert into your db. > > What do you think? Possible?
Possible for sure, but this should produce more problems like massive redundant cached items, because some clients have a different type of continuum. This is most likely not happening. The current failure rate is smaller 0,0001% and they appear on different frontend-servers. It feels like a very unlikely thing is happening here due to a massive number of used add(), with a very rare number of failures. > > elSchrom <[email protected]> schrieb: > > > > >On 14 Okt., 10:00, dormando <[email protected]> wrote: > >> > our 50+ consistent hashing cluster is very reliable on normal > >> > operations, incr/decr, get, set, multiget, etc. is not a problem. If > >> > we have a problem with keys on wrong servers in the continuum, we > >> > should have more problems, which we currently have not. > >> > The cluster is always under relatively high load (the number of > >> > connections for example is very high due to 160+ webservers in the > >> > front). We are now expecting in a very few cases, that this > >> > locking mechanism does not work. Two different clients try to lock the > >> > with the same object (if you want to prevent multiple inserts in a > >> > database on the same > >> > primary key you have to explicitly set one key valid for all clients > >> > and not a key with unique hashes in it), it works millions of times as > >> > expected (we are generating a large number of user triggered database > >> > inserts (~60/sec.) > >> > with this construct). But a handful of locks does not work and shows > >> > the behaviour described. So now my question is again: is it thinkable > >> > (even if it is very implausible), that > >> > a multithreaded memd does not provide 100% sure atomic add()? > > >> restart memcached with -t 1 and see if it stops happening. I already said > >> it's not possible. > > >Yeah, right. :-) Restarting all memd instances is not an option. Can > >you explain, why it is not possible?
