Re: Issue 363 in memcached: MemcachePool::get(): Server 127.0.0.1 (tcp 11211, udp 0) failed with: Network timeout

dormando Wed, 07 May 2014 17:20:00 -0700

Hey,

try this branch:
https://github.com/dormando/memcached/tree/double_close


so far as I can tell that emulates the behavior in .17...

to build:
./autogen.sh && ./configure && make

run it in screen like you were doing with the other tests, see if it
prints "ERROR: Double Close [somefd]". If it prints that once then stops,
I guess that's what .17 was doing... if it print spams, then something
else may have changed.

I'm mostly convinced something about your OS or build is corrupt, but I
have no idea what it is. The only other thing I can think of is to
instrument .17 a bit more and have you try that (with the connection code
laid out the old way, but with a conn_closed flag to detect a double close
attempt), and see if the old .17 still did it.

On Tue, 6 May 2014, [email protected] wrote:

> Changing from 4 threads to 1 seems to have resolved the problem. No timeouts 
> since. Should I set to 2 threads and wait and see how things go?
>
> On Tuesday, May 6, 2014 12:07:08 AM UTC-7, Dormando wrote:
>       and how'd that work out?
>
>       Still no other reports :/ a few thousand more downloads of .19...
>
>       On Sun, 4 May 2014, [email protected] wrote:
>
>       > I'm going to try switching threads from 4 to 1. This host web2 is on 
> the only one I am seeing it on, but it also is the only hosts
>       that gets any
>       > real traffic. Super frustrating.
>       >
>       > On Sunday, May 4, 2014 10:12:08 AM UTC-7, Dormando wrote:
>       >       I'm stumped. (also, your e-mails aren't updating the ticket...).
>       >
>       >       It's impossible for a connection to get into the closed state 
> without
>       >       having event_del() and close() called on the socket. A socket 
> slot isn't
>       >       event_add()'ed again until after the state is reset to 
> 'init_state'.
>       >
>       >       There was no code path for event_del to actually fail so far as 
> I could
>       >       see.
>       >
>       >       I've e-mailed steven grimm for ideas but either that's not his 
> e-mail
>       >       anymore or he's not going to respond.
>       >
>       >       I really don't know. I guess the old code would've just called 
> conn_close
>       >       again by accident... I don't see how the logic changed in any 
> significant
>       >       way in .18. Though again, if it happened with any frequency 
> people's
>       >       curr_conns stat would go negative.
>       >
>       >       So... either that always happened and we never noticed, or your 
> particular
>       >       OS is corrupt. There're probably 10,000+ installs of .18+ now 
> and only one
>       >       complaint, so I'm a little hesitant to spend a ton of time on 
> this until
>       >       we get more reports.
>       >
>       >       You should downgrade to .17.
>       >
>       >       On Sun, 4 May 2014, [email protected] wrote:
>       >
>       >       > Damn it, got network timeout. CPU 3 is using 100% cpu from 
> memcached.
>       >       > Here is the result of stat to verify using new version of 
> memcached and libevent:
>       >       >
>       >       > STAT version 1.4.19
>       >       > STAT libevent 2.0.18-stable
>       >       >
>       >       >
>       >       > On Saturday, May 3, 2014 11:55:31 PM UTC-7, 
> [email protected] wrote:
>       >       >       Just upgraded all 5 web-servers to memcached 1.4.19 
> with libevent 2.0.18. Will advise if I see memcached timeouts.
>       Should be
>       >       good
>       >       >       though.
>       >       >
>       >       > Thanks so much for all the help and patience. Really 
> appreciated.
>       >       >
>       >       > On Friday, May 2, 2014 10:20:26 PM UTC-7, 
> [email protected] wrote:
>       >       >       Updates:
>       >       >               Status: Invalid
>       >       >
>       >       >       Comment #20 on issue 363 by [email protected]: 
> MemcachePool::get(): Server  
>       >       >       127.0.0.1 (tcp 11211, udp 0) failed with: Network 
> timeout
>       >       >       http://code.google.com/p/memcached/issues/detail?id=363
>       >       >
>       >       >       Any repeat crashes? I'm going to close this. it looks 
> like remi  
>       >       >       shipped .19. reopen or open a new one if it hangs in 
> the same way somehow...
>       >       >
>       >       >       Well. 19 won't be printing anything, and it won't hang, 
> but if it's  
>       >       >       actually our bug and not libevent it would end up 
> spinning CPU. Keep an eye  
>       >       >       out I guess.
>       >       >
>       >       >       --
>       >       >       You received this message because this project is 
> configured to send all  
>       >       >       issue notifications to this address.
>       >       >       You may adjust your notification preferences at:
>       >       >       https://code.google.com/hosting/settings
>       >       >
>       >       > --
>       >       >
>       >       > ---
>       >       > You received this message because you are subscribed to the 
> Google Groups "memcached" group.
>       >       > To unsubscribe from this group and stop receiving emails from 
> it, send an email to [email protected].
>       >       > For more options, visit https://groups.google.com/d/optout.
>       >       >
>       >       >
>       >
>       > --
>       >
>       > ---
>       > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>       > To unsubscribe from this group and stop receiving emails from it, 
> send an email to [email protected].
>       > For more options, visit https://groups.google.com/d/optout.
>       >
>       >
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Issue 363 in memcached: MemcachePool::get(): Server 127.0.0.1 (tcp 11211, udp 0) failed with: Network timeout

Reply via email to