Hey, try this branch: https://github.com/dormando/memcached/tree/double_close
so far as I can tell that emulates the behavior in .17... to build: ./autogen.sh && ./configure && make run it in screen like you were doing with the other tests, see if it prints "ERROR: Double Close [somefd]". If it prints that once then stops, I guess that's what .17 was doing... if it print spams, then something else may have changed. I'm mostly convinced something about your OS or build is corrupt, but I have no idea what it is. The only other thing I can think of is to instrument .17 a bit more and have you try that (with the connection code laid out the old way, but with a conn_closed flag to detect a double close attempt), and see if the old .17 still did it. On Tue, 6 May 2014, [email protected] wrote: > Changing from 4 threads to 1 seems to have resolved the problem. No timeouts > since. Should I set to 2 threads and wait and see how things go? > > On Tuesday, May 6, 2014 12:07:08 AM UTC-7, Dormando wrote: > and how'd that work out? > > Still no other reports :/ a few thousand more downloads of .19... > > On Sun, 4 May 2014, [email protected] wrote: > > > I'm going to try switching threads from 4 to 1. This host web2 is on > the only one I am seeing it on, but it also is the only hosts > that gets any > > real traffic. Super frustrating. > > > > On Sunday, May 4, 2014 10:12:08 AM UTC-7, Dormando wrote: > > I'm stumped. (also, your e-mails aren't updating the ticket...). > > > > It's impossible for a connection to get into the closed state > without > > having event_del() and close() called on the socket. A socket > slot isn't > > event_add()'ed again until after the state is reset to > 'init_state'. > > > > There was no code path for event_del to actually fail so far as > I could > > see. > > > > I've e-mailed steven grimm for ideas but either that's not his > e-mail > > anymore or he's not going to respond. > > > > I really don't know. I guess the old code would've just called > conn_close > > again by accident... I don't see how the logic changed in any > significant > > way in .18. Though again, if it happened with any frequency > people's > > curr_conns stat would go negative. > > > > So... either that always happened and we never noticed, or your > particular > > OS is corrupt. There're probably 10,000+ installs of .18+ now > and only one > > complaint, so I'm a little hesitant to spend a ton of time on > this until > > we get more reports. > > > > You should downgrade to .17. > > > > On Sun, 4 May 2014, [email protected] wrote: > > > > > Damn it, got network timeout. CPU 3 is using 100% cpu from > memcached. > > > Here is the result of stat to verify using new version of > memcached and libevent: > > > > > > STAT version 1.4.19 > > > STAT libevent 2.0.18-stable > > > > > > > > > On Saturday, May 3, 2014 11:55:31 PM UTC-7, > [email protected] wrote: > > > Just upgraded all 5 web-servers to memcached 1.4.19 > with libevent 2.0.18. Will advise if I see memcached timeouts. > Should be > > good > > > though. > > > > > > Thanks so much for all the help and patience. Really > appreciated. > > > > > > On Friday, May 2, 2014 10:20:26 PM UTC-7, > [email protected] wrote: > > > Updates: > > > Status: Invalid > > > > > > Comment #20 on issue 363 by [email protected]: > MemcachePool::get(): Server > > > 127.0.0.1 (tcp 11211, udp 0) failed with: Network > timeout > > > http://code.google.com/p/memcached/issues/detail?id=363 > > > > > > Any repeat crashes? I'm going to close this. it looks > like remi > > > shipped .19. reopen or open a new one if it hangs in > the same way somehow... > > > > > > Well. 19 won't be printing anything, and it won't hang, > but if it's > > > actually our bug and not libevent it would end up > spinning CPU. Keep an eye > > > out I guess. > > > > > > -- > > > You received this message because this project is > configured to send all > > > issue notifications to this address. > > > You may adjust your notification preferences at: > > > https://code.google.com/hosting/settings > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the > Google Groups "memcached" group. > > > To unsubscribe from this group and stop receiving emails from > it, send an email to [email protected]. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected]. > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "memcached" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
