On 14 Okt., 14:01, Dieter Schmidt <[email protected]> wrote:
> What happens if the add cmd failes because of an unlikely network error?

The situation is: two different clients are doing an add() with the
same key at the same time. Both are getting true (assuming that this
key has to be on the same machine,
it has to be an threading problem or a bug in add()). This breaks the
atomic behaviour, we are expecting. But we can not prove, that the key
is in that moment on
the same server, because it is highly volatile. It is just
speculation, because if keys are not stored correctly due to
consistent hashing problems, we should expect more problems.

>
> elSchrom <[email protected]> schrieb:
>
> >Hi Diez,
>
> >On 14 Okt., 11:39, Dieter Schmidt <[email protected]> wrote:
> >> For me it sounds like a configuration problem on the webservers or an 
> >> availability/accessability issue.
> >> If for example all machines are accessable the locking key resides on 
> >> maschine x. If one of the servers webservers differers in cfg it can 
> >> happen that the key is added a second time as new somewhere else in the 
> >> continuum. As result you will have a second insert into your db.
>
> >> What do you think? Possible?
>
> >Possible for sure, but this should produce more problems like massive
> >redundant cached items, because some clients have a different type of
> >continuum. This is most likely not happening. The current failure rate
> >is smaller 0,0001% and they appear on different frontend-servers. It
> >feels like a very unlikely thing is happening here due to a massive
> >number of used add(), with a very rare number of failures.
>
> >> elSchrom <[email protected]> schrieb:
>
> >> >On 14 Okt., 10:00, dormando <[email protected]> wrote:
> >> >> > our 50+ consistent hashing cluster is very reliable on normal
> >> >> > operations, incr/decr, get, set, multiget, etc. is not a problem. If
> >> >> > we have a problem with keys on wrong servers in the continuum, we
> >> >> > should have more problems, which we currently have not.
> >> >> > The cluster is always under relatively high load (the number of
> >> >> > connections for example is very high due to 160+ webservers in the
> >> >> > front). We are now expecting in a very few cases, that this
> >> >> > locking mechanism does not work. Two different clients try to lock the
> >> >> > with the same object (if you want to prevent multiple inserts in a
> >> >> > database on the same
> >> >> > primary key you have to explicitly set one key valid for all clients
> >> >> > and not a key with unique hashes in it), it works millions of times as
> >> >> > expected (we are generating a large number of user triggered database
> >> >> > inserts (~60/sec.)
> >> >> > with this construct). But a handful of locks does not work and shows
> >> >> > the behaviour described. So now my question is again: is it thinkable
> >> >> > (even if it is very implausible), that
> >> >> > a multithreaded memd does not provide 100% sure atomic add()?
>
> >> >> restart memcached with -t 1 and see if it stops happening. I already 
> >> >> said
> >> >> it's not possible.
>
> >> >Yeah, right. :-) Restarting all memd instances is not an option. Can
> >> >you explain, why it is not possible?

Reply via email to