Re: Issue 363 in memcached: MemcachePool::get(): Server 127.0.0.1 (tcp 11211, udp 0) failed with: Network timeout

memcached Wed, 30 Apr 2014 00:44:44 -0700

Updates:
        Status: Accepted

Comment #15 on issue 363 by [email protected]: MemcachePool::get(): Server127.0.0.1 (tcp 11211, udp 0) failed with: Network timeout

http://code.google.com/p/memcached/issues/detail?id=363


So it turned out that c->state was set to "conn_closed".

Looking at this a bit more closely:

conn_closed is only ever set from conn_close().

conn_close() is only called from two spots: once, before drive_machine() isentered (and early returned from), and once from within drive_machine(),directly above the conn_closed case and with a stop/break before it:


        case conn_closing:
            if (IS_UDP(c->transport))
                conn_cleanup(c);
            else
                conn_close(c);
            stop = true;
            break;

        case conn_closed:

... conn_close() always deletes the event from the stack, closes thefilehandle, etc.


So, I don't see how this could happen... yet...

None of the code changed between 1.4.17 or 1.4.18 seems to add new pathswhich could cause a connection to re-fire.

If conn_close() is called there's no way for that to loop again (stop =true).

What would have to happen is the event_handler firing again, on the closedconnection, which the fd for is closed, the event is deleted, but thememory not reused just yet... It would then enter drive_machine with thestate already set to conn_closing, and never trip a stop = true, and notassert since it's not a debug binary.

Which is fucking terrifying. If this happened in the old code it'd justkeep running into conn_closing and re-closing itself. though I was prettysure that calling event_del() twice causes a crash.

The other possibility is that a UDP socket is getting closed... except thatalso deletes the event, and closes the socket, so no new events shouldhappen.

I did a quick test and added a second conn_close() call and.... it doesn'tcause a crash. it causes the curr_connections counter to slowly drift. Thatis wild.

I just pushed:https://github.com/memcached/memcached/commit/ee961e456457728ba78057961eca357edaea1ec1


...up to master.

I'm still a bit suspicious that I'm missing something important here... soI'm not doing a release tonight, but I might do one early tomorrow anyway.

Reporter is currently running a version of this patch in production; if histhing hangs up again, or doesn't self-recover once hitting the condition,we'll have a better idea I guess.

--

You received this message because this project is configured to send allissue notifications to this address.

You may adjust your notification preferences at:
https://code.google.com/hosting/settings

--

---You received this message because you are subscribed to the Google Groups "memcached" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Issue 363 in memcached: MemcachePool::get(): Server 127.0.0.1 (tcp 11211, udp 0) failed with: Network timeout

Reply via email to