Hi Diez,

On 14 Okt., 11:39, Dieter Schmidt <[email protected]> wrote:
> For me it sounds like a configuration problem on the webservers or an 
> availability/accessability issue.
> If for example all machines are accessable the locking key resides on 
> maschine x. If one of the servers webservers differers in cfg it can happen 
> that the key is added a second time as new somewhere else in the continuum. 
> As result you will have a second insert into your db.
>
> What do you think? Possible?


Possible for sure, but this should produce more problems like massive
redundant cached items, because some clients have a different type of
continuum. This is most likely not happening. The current failure rate
is smaller 0,0001% and they appear on different frontend-servers. It
feels like a very unlikely thing is happening here due to a massive
number of used add(), with a very rare number of failures.

>
> elSchrom <[email protected]> schrieb:
>
>
>
> >On 14 Okt., 10:00, dormando <[email protected]> wrote:
> >> > our 50+ consistent hashing cluster is very reliable on normal
> >> > operations, incr/decr, get, set, multiget, etc. is not a problem. If
> >> > we have a problem with keys on wrong servers in the continuum, we
> >> > should have more problems, which we currently have not.
> >> > The cluster is always under relatively high load (the number of
> >> > connections for example is very high due to 160+ webservers in the
> >> > front). We are now expecting in a very few cases, that this
> >> > locking mechanism does not work. Two different clients try to lock the
> >> > with the same object (if you want to prevent multiple inserts in a
> >> > database on the same
> >> > primary key you have to explicitly set one key valid for all clients
> >> > and not a key with unique hashes in it), it works millions of times as
> >> > expected (we are generating a large number of user triggered database
> >> > inserts (~60/sec.)
> >> > with this construct). But a handful of locks does not work and shows
> >> > the behaviour described. So now my question is again: is it thinkable
> >> > (even if it is very implausible), that
> >> > a multithreaded memd does not provide 100% sure atomic add()?
>
> >> restart memcached with -t 1 and see if it stops happening. I already said
> >> it's not possible.
>
> >Yeah, right. :-) Restarting all memd instances is not an option. Can
> >you explain, why it is not possible?

Reply via email to