Re: all our memcached servers failing w/ EPIPE from time to time, but ...

dormando Mon, 09 Jan 2012 09:20:28 -0800

> The phenomena was/is that we had lots of memcached clients
> getting an EPIPE, and as soon as we identified the failing
> memcached node and could not connect using fresh tiny
> test-scripts nor directly either.
>
> First we thought it *MIGHT* be also a hardware issue, or whatnot,
> and we wrote a monitoring test (checking every memcached node
> every 10 seconds individually connecting to it, test-writing, rereading
> and comparing the values, and report on error).
>
> The result though, is, that we get EPIPEs and even rare EOFs
> quite regulary, not that often that we (do not yet know, we) should
> care about, but we now know of **two** cases where
> we had a peak EPIPE scenario (one very big, one somewhat big)
> where once only a restart helped.
>
> We came to the conclusion, that it definitely might some issue
> with the software itself, and though, seek help from upstream, in the hope
> to get any kind of advise, help, or whatever you think
> that might help us in getting such things fixed.
>
> FYI: the clients are all ruby using standard gem "redis" version 2.1.1
> and memcached version 1x 1.4.4 and 4x 1.4.2.


Any idea if you're hitting the max connections limit? Once you do you
won't be able to connect anymore until some bleed off. EPIPE could be your
client's way of handling a connect timeout.

Can you upgrade one of them to 1.4.9, and perhaps even start it with -o
maxconns_fast? That way you'll rule out many of the bugfixes we've had
since then, and if the one running maxconns_fast starts EPIPE'ing, you'll
see an error message when you attempt to connect to it if it is connected
to a max conns error.

-Dormando

Re: all our memcached servers failing w/ EPIPE from time to time, but ...

Reply via email to