To that note, it *is* useful if you try that branch I posted, since so far as I can tell that should emulate the .17 behavior.
On Thu, 8 May 2014, dormando wrote: > > I am just speculating, and by no means have any idea what I am really > > talking about here. :) > > With 2 threads, still solid, no timeouts, no runaway 100% cpu. Its been > > days. Increasing from 2 threads to 4 does not generate any more traffic or > > requests to memcached. Thus I am speculating perhaps it is a race-condition > > or some sort, only hitting with > 2 threads. > > Doesn't tell me anything useful, since I'm already looking for potential > races and don't see any possibility outside of libevent. > > > Why do you say it will be less likely to happen with 2 threads than 4? > > Nature of race conditions: the more threads you have running the more > likely you are to hit them, sometimes on order of magnitudes. > > It doesn't really change the fact that this has worked for many years and > the code *barely* changed recently. I just don't see it. > > > On Wednesday, May 7, 2014 5:38:47 PM UTC-7, Dormando wrote: > > That doesn't really tell us anything about the nature of the problem > > though. With 2 threads it might still happen, but is a lot less > > likely. > > > > On Wed, 7 May 2014, [email protected] wrote: > > > > > Bumped up to 2 threads and so far no timeout errors. I'm going to > > let it run for a few more days, then revert back to 4 threads and > > see if timeout > > > errors come up again. That will tell us the problem lies in > > spawning more than 2 threads. > > > > > > On Wednesday, May 7, 2014 5:19:13 PM UTC-7, Dormando wrote: > > > Hey, > > > > > > try this branch: > > > https://github.com/dormando/memcached/tree/double_close > > > > > > so far as I can tell that emulates the behavior in .17... > > > > > > to build: > > > ./autogen.sh && ./configure && make > > > > > > run it in screen like you were doing with the other tests, > > see if it > > > prints "ERROR: Double Close [somefd]". If it prints that once > > then stops, > > > I guess that's what .17 was doing... if it print spams, then > > something > > > else may have changed. > > > > > > I'm mostly convinced something about your OS or build is > > corrupt, but I > > > have no idea what it is. The only other thing I can think of > > is to > > > instrument .17 a bit more and have you try that (with the > > connection code > > > laid out the old way, but with a conn_closed flag to detect a > > double close > > > attempt), and see if the old .17 still did it. > > > > > > On Tue, 6 May 2014, [email protected] wrote: > > > > > > > Changing from 4 threads to 1 seems to have resolved the > > problem. No timeouts since. Should I set to 2 threads and wait and > > see how > > > things go? > > > > > > > > On Tuesday, May 6, 2014 12:07:08 AM UTC-7, Dormando wrote: > > > > and how'd that work out? > > > > > > > > Still no other reports :/ a few thousand more > > downloads of .19... > > > > > > > > On Sun, 4 May 2014, [email protected] wrote: > > > > > > > > > I'm going to try switching threads from 4 to 1. > > This host web2 is on the only one I am seeing it on, but it also is > > the only > > > hosts > > > > that gets any > > > > > real traffic. Super frustrating. > > > > > > > > > > On Sunday, May 4, 2014 10:12:08 AM UTC-7, Dormando > > wrote: > > > > > I'm stumped. (also, your e-mails aren't > > updating the ticket...). > > > > > > > > > > It's impossible for a connection to get into > > the closed state without > > > > > having event_del() and close() called on the > > socket. A socket slot isn't > > > > > event_add()'ed again until after the state is > > reset to 'init_state'. > > > > > > > > > > There was no code path for event_del to > > actually fail so far as I could > > > > > see. > > > > > > > > > > I've e-mailed steven grimm for ideas but > > either that's not his e-mail > > > > > anymore or he's not going to respond. > > > > > > > > > > I really don't know. I guess the old code > > would've just called conn_close > > > > > again by accident... I don't see how the > > logic changed in any significant > > > > > way in .18. Though again, if it happened with > > any frequency people's > > > > > curr_conns stat would go negative. > > > > > > > > > > So... either that always happened and we > > never noticed, or your particular > > > > > OS is corrupt. There're probably 10,000+ > > installs of .18+ now and only one > > > > > complaint, so I'm a little hesitant to spend > > a ton of time on this until > > > > > we get more reports. > > > > > > > > > > You should downgrade to .17. > > > > > > > > > > On Sun, 4 May 2014, [email protected] > > wrote: > > > > > > > > > > > Damn it, got network timeout. CPU 3 is > > using 100% cpu from memcached. > > > > > > Here is the result of stat to verify using > > new version of memcached and libevent: > > > > > > > > > > > > STAT version 1.4.19 > > > > > > STAT libevent 2.0.18-stable > > > > > > > > > > > > > > > > > > On Saturday, May 3, 2014 11:55:31 PM UTC-7, > > [email protected] wrote: > > > > > > Just upgraded all 5 web-servers to > > memcached 1.4.19 with libevent 2.0.18. Will advise if I see > > memcached > > > timeouts. > > > > Should be > > > > > good > > > > > > though. > > > > > > > > > > > > Thanks so much for all the help and > > patience. Really appreciated. > > > > > > > > > > > > On Friday, May 2, 2014 10:20:26 PM UTC-7, > > [email protected] wrote: > > > > > > Updates: > > > > > > Status: Invalid > > > > > > > > > > > > Comment #20 on issue 363 by > > [email protected]: MemcachePool::get(): Server > > > > > > 127.0.0.1 (tcp 11211, udp 0) failed > > with: Network timeout > > > > > > > > http://code.google.com/p/memcached/issues/detail?id=363 > > > > > > > > > > > > Any repeat crashes? I'm going to > > close this. it looks like remi > > > > > > shipped .19. reopen or open a new one > > if it hangs in the same way somehow... > > > > > > > > > > > > Well. 19 won't be printing anything, > > and it won't hang, but if it's > > > > > > actually our bug and not libevent it > > would end up spinning CPU. Keep an eye > > > > > > out I guess. > > > > > > > > > > > > -- > > > > > > You received this message because > > this project is configured to send all > > > > > > issue notifications to this address. > > > > > > You may adjust your notification > > preferences at: > > > > > > > > https://code.google.com/hosting/settings > > > > > > > > > > > > -- > > > > > > > > > > > > --- > > > > > > You received this message because you are > > subscribed to the Google Groups "memcached" group. > > > > > > To unsubscribe from this group and stop > > receiving emails from it, send an email to > > [email protected]. > > > > > > For more options, visit > > https://groups.google.com/d/optout. > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > --- > > > > > You received this message because you are > > subscribed to the Google Groups "memcached" group. > > > > > To unsubscribe from this group and stop receiving > > emails from it, send an email to [email protected]. > > > > > For more options, visit > > https://groups.google.com/d/optout. > > > > > > > > > > > > > > > > > > -- > > > > > > > > --- > > > > You received this message because you are subscribed to the > > Google Groups "memcached" group. > > > > To unsubscribe from this group and stop receiving emails > > from it, send an email to [email protected]. > > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the Google > > Groups "memcached" group. > > > To unsubscribe from this group and stop receiving emails from it, > > send an email to [email protected]. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google Groups > > "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to [email protected]. > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "memcached" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
