I am also having the "connection problem" and eventually reach this thread.
Can I ask for some questions? what do you mean for this? <quote> > If you're running single-threaded, you'll need to switch tomulti-threaded > to get this benefit. A dedicated acceptthreadis a common design pattern > memcached just hadn't adopted until just now. </quote> the server is not multi-threaded by default? or do I misunderstand your words? for your fix, it is applied to the server, so the client needs not change? more importantly, can I know when will this fix be released? thanks On 8月31日, 下午3時48分, dormando <[EMAIL PROTECTED]> wrote: > Hey John! > > (Don/Darryl, read this too, please?) > > Thanks for the detailed report. I've been busy with work all week and > wanted to give this a non-hand-waivy response. > > As far as connection issues go, and that specific explanation of the > SYN/ACK actually happening but really late, I believe this is the only > possible fix: > > http://consoleninja.net/gitweb/gitweb.cgi?p=memcached.git;a=commitdif... > > Been kicking around stable tree tonight: > > http://consoleninja.net/gitweb/gitweb.cgi?p=memcached.git;a=shortlog;... > > ... I'll hopefully send a followup soon about the facebook patches. Bear > with us for now :) > > You didn't see any TCP retries during that window, etc? My gut tells me > this fix will repair some timeouts, but the odds of accept not kicking in > for a full second are too low. > > If you're running single-threaded, you'll need to switch tomulti-threaded > to get this benefit. A dedicated acceptthreadis a common design pattern > memcached just hadn't adopted until just now. > > Any chance you/Don/etc could run the stable tree on a box or two and see > if it removes *or* reduces the connection timeout? > > -Dormando > > On Tue, 19 Aug 2008, John Allspaw wrote: > > HelloHello. > > > We've seen connection issues with memcached for a while now, and the cause > > is elusive. I'd love for it to be a fault in the network, and have been > > biased in looking for that to be the cause, but I can't find anything up in > > tcp/ip land to be the culprit. > > > The client logs an error like this: > > www100.flickr [19/Aug/2008:13:47:41 +0000] [error] [client x.x.x.x] > > [app_warn] [php] WARNING: connect() [<a > > href='function.connect'>function.connect</a>]: Can't connect to woe1:11211, > > Connection failed (0) in <php script> line 287 > > > A tcpdump shows it in the wild: client sends a SYN, memcached server takes > >> 5 seconds to return a SYN/ACK, at which point the client gives memcached > > the finger via an RST packet: > > > No. Time Source Destination Protocol src > > port dst port Info > > 165 1.262255 209.191.105.168 68.142.214.227 TCP > > 9048 11211 9048 > 11211 [SYN] Seq=0 Len=0 MSS=1460 WS=8 > > 738 5.093003 68.142.214.227 209.191.105.168 TCP > > 11211 9048 11211 > 9048 [SYN, ACK] Seq=0 Ack=1 Win=373760 Len=0 > > MSS=1460 WS=6 > > 739 5.093016 209.191.105.168 68.142.214.227 TCP > > 9048 11211 9048 > 11211 [RST] Seq=1 Len=0 > > > The client has PECL php client memcache-3.0.1, server is memcached-1.2.6. > > > An mtr run for 24 hours shows no packet loss between the two machines, and > > the issue isn't port/switch/host specific, since we see this same issue from > > all of our front-end machines, across all of our memcached servers, which > > span several racks and switches. The client connects via IP, so no DNS is > > needed. > > > No firewalls/iptables/connection tracking/etc running on either client or > > server. > > > Any thoughts? We're handling the connection failures, but it's annoying and > > I can't help but think there's something stupid going on. > > More detail on both client and server below. > > > thanks, > > allspaw > > > server is: RHEL 4U2 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 > > i686 i386 GNU/Linux > > client is: RHEL 4U4 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686 > > i686 i386 GNU/Linux > > > lsmod for server is: > > Module Size Used by > > md5 8001 1 > > ipv6 240097 609 > > i2c_dev 14273 0 > > i2c_core 25921 1 i2c_dev > > nfs 199205 1 > > lockd 65257 2 nfs > > sunrpc 139173 4 nfs,lockd > > dm_mirror 28449 0 > > dm_mod 58949 1 dm_mirror > > uhci_hcd 32729 0 > > ehci_hcd 31813 0 > > e1000 96429 0 > > floppy 58065 0 > > aic79xx 187485 0 > > ext3 118729 1 > > jbd 59481 1 ext3 > > sata_sil 12869 0 > > ata_piix 13253 0 > > libata 47901 2 sata_sil,ata_piix > > megaraid_mbox 37073 0 > > megaraid_mm 17905 1 megaraid_mbox > > sd_mod 20545 0 > > scsi_mod 116429 4 aic79xx,libata,megaraid_mbox,sd_mod > > > lsmod for client is: > > Module Size Used by > > ylock 17568 2 > > md5 8001 1 > > ipv6 241761 28 > > i2c_dev 14273 0 > > i2c_core 25921 1 i2c_dev > > button 10449 0 > > battery 12869 0 > > ac 8773 0 > > joydev 14209 0 > > uhci_hcd 32729 0 > > ehci_hcd 32069 0 > > tg3 100933 0 > > dm_snapshot 21093 0 > > dm_zero 6337 0 > > dm_mirror 31645 0 > > ext3 118729 4 > > jbd 59609 1 ext3 > > dm_mod 60357 12 dm_snapshot,dm_zero,dm_mirror > > mptscsih 5569 0 > > mptsas 13389 3 mptscsih > > mptspi 13261 1 mptscsih > > mptfc 12617 1 mptscsih > > mptscsi 44125 3 mptsas,mptspi,mptfc > > mptbase 61345 4 mptsas,mptspi,mptfc,mptscsi > > sd_mod 20545 3 > > scsi_mod 117709 5 mptsas,mptspi,mptfc,mptscsi,sd_mod > > > sysctl -a | grep tcp for both client and server shows: > > sunrpc.tcp_slot_table_entries = 16 > > net.ipv4.tcp_bic_beta = 819 > > net.ipv4.tcp_tso_win_divisor = 8 > > net.ipv4.tcp_moderate_rcvbuf = 1 > > net.ipv4.tcp_bic_low_window = 14 > > net.ipv4.tcp_bic_fast_convergence = 1 > > net.ipv4.tcp_bic = 1 > > net.ipv4.tcp_vegas_gamma = 2 > > net.ipv4.tcp_vegas_beta = 6 > > net.ipv4.tcp_vegas_alpha = 2 > > net.ipv4.tcp_vegas_cong_avoid = 0 > > net.ipv4.tcp_westwood = 0 > > net.ipv4.tcp_no_metrics_save = 0 > > net.ipv4.tcp_low_latency = 0 > > net.ipv4.tcp_frto = 0 > > net.ipv4.tcp_tw_reuse = 0 > > net.ipv4.tcp_adv_win_scale = 2 > > net.ipv4.tcp_app_win = 31 > > net.ipv4.tcp_rmem = 8192 873800 8738000 > > net.ipv4.tcp_wmem = 4096 655360 6553600 > > net.ipv4.tcp_mem = 786432 1048576 1572864 > > net.ipv4.tcp_dsack = 1 > > net.ipv4.tcp_ecn = 0 > > net.ipv4.tcp_reordering = 3 > > net.ipv4.tcp_fack = 1 > > net.ipv4.tcp_orphan_retries = 0 > > net.ipv4.tcp_max_syn_backlog = 8192 > > net.ipv4.tcp_rfc1337 = 0 > > net.ipv4.tcp_stdurg = 0 > > net.ipv4.tcp_abort_on_overflow = 0 > > net.ipv4.tcp_tw_recycle = 1 > > net.ipv4.tcp_syncookies = 1 > > net.ipv4.tcp_fin_timeout = 10 > > net.ipv4.tcp_retries2 = 15 > > net.ipv4.tcp_retries1 = 3 > > net.ipv4.tcp_keepalive_intvl = 75 > > net.ipv4.tcp_keepalive_probes = 9 > > net.ipv4.tcp_keepalive_time = 7200 > > net.ipv4.tcp_max_tw_buckets = 180000 > > net.ipv4.tcp_max_orphans = 262144 > > net.ipv4.tcp_synack_retries = 5 > > net.ipv4.tcp_syn_retries = 5 > > net.ipv4.tcp_retrans_collapse = 0 > > net.ipv4.tcp_sack = 1 > > net.ipv4.tcp_window_scaling = 1 > > net.ipv4.tcp_timestamps = 0 > > fs.nfs.nlm_tcpport = 0 > > > -- > > John Allspaw > >http://flickr.com/photos/allspaw
