I am also having the "connection problem" and eventually reach this
thread.

Can I ask for some questions?

what do you mean for this?
<quote>
> If you're running single-threaded, you'll need to switch tomulti-threaded
> to get this benefit. A dedicated acceptthreadis a common design pattern
> memcached just hadn't adopted until just now.
</quote>

the server is not multi-threaded by default? or do I misunderstand
your words?

for your fix, it is applied to the server, so the client needs not
change?

more importantly, can I know when will this fix be released?

thanks

On 8月31日, 下午3時48分, dormando <[EMAIL PROTECTED]> wrote:
> Hey John!
>
> (Don/Darryl, read this too, please?)
>
> Thanks for the detailed report. I've been busy with work all week and
> wanted to give this a non-hand-waivy response.
>
> As far as connection issues go, and that specific explanation of the
> SYN/ACK actually happening but really late, I believe this is the only
> possible fix:
>
> http://consoleninja.net/gitweb/gitweb.cgi?p=memcached.git;a=commitdif...
>
> Been kicking around stable tree tonight:
>
> http://consoleninja.net/gitweb/gitweb.cgi?p=memcached.git;a=shortlog;...
>
> ... I'll hopefully send a followup soon about the facebook patches. Bear
> with us for now :)
>
> You didn't see any TCP retries during that window, etc? My gut tells me
> this fix will repair some timeouts, but the odds of accept not kicking in
> for a full second are too low.
>
> If you're running single-threaded, you'll need to switch tomulti-threaded
> to get this benefit. A dedicated acceptthreadis a common design pattern
> memcached just hadn't adopted until just now.
>
> Any chance you/Don/etc could run the stable tree on a box or two and see
> if it removes *or* reduces the connection timeout?
>
> -Dormando
>
> On Tue, 19 Aug 2008, John Allspaw wrote:
> > HelloHello.
>
> > We've seen connection issues with memcached for a while now, and the cause
> > is elusive. I'd love for it to be a fault in the network, and have been
> > biased in looking for that to be the cause, but I can't find anything up in
> > tcp/ip land to be the culprit.
>
> > The client logs an error like this:
> > www100.flickr [19/Aug/2008:13:47:41 +0000] [error] [client x.x.x.x]
> > [app_warn] [php] WARNING: connect() [<a
> > href='function.connect'>function.connect</a>]: Can't connect to woe1:11211,
> > Connection failed (0) in <php script> line 287
>
> > A tcpdump shows it in the wild: client sends a SYN, memcached server takes
> >> 5 seconds to return a SYN/ACK, at which point the client gives memcached
> > the finger via an RST packet:
>
> > No.     Time        Source                Destination           Protocol src
> > port dst port Info
> >    165 1.262255    209.191.105.168       68.142.214.227        TCP
> > 9048     11211    9048 > 11211 [SYN] Seq=0 Len=0 MSS=1460 WS=8
> >    738 5.093003    68.142.214.227        209.191.105.168       TCP
> > 11211    9048     11211 > 9048 [SYN, ACK] Seq=0 Ack=1 Win=373760 Len=0
> > MSS=1460 WS=6
> >    739 5.093016    209.191.105.168       68.142.214.227        TCP
> > 9048     11211    9048 > 11211 [RST] Seq=1 Len=0
>
> > The client has PECL php client memcache-3.0.1, server is memcached-1.2.6.
>
> > An mtr run for 24 hours shows no packet loss between the two machines, and
> > the issue isn't port/switch/host specific, since we see this same issue from
> > all of our front-end machines, across all of our memcached servers, which
> > span several racks and switches.  The client connects via IP, so no DNS is
> > needed.
>
> > No firewalls/iptables/connection tracking/etc running on either client or
> > server.
>
> > Any thoughts? We're handling the connection failures, but it's annoying and
> > I can't help but think there's something stupid going on.
> > More detail on both client and server below.
>
> > thanks,
> > allspaw
>
> > server is: RHEL 4U2 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686
> > i686 i386 GNU/Linux
> > client is: RHEL 4U4 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686
> > i686 i386 GNU/Linux
>
> > lsmod for server is:
> > Module                  Size  Used by
> > md5                     8001  1
> > ipv6                  240097  609
> > i2c_dev                14273  0
> > i2c_core               25921  1 i2c_dev
> > nfs                   199205  1
> > lockd                  65257  2 nfs
> > sunrpc                139173  4 nfs,lockd
> > dm_mirror              28449  0
> > dm_mod                 58949  1 dm_mirror
> > uhci_hcd               32729  0
> > ehci_hcd               31813  0
> > e1000                  96429  0
> > floppy                 58065  0
> > aic79xx               187485  0
> > ext3                  118729  1
> > jbd                    59481  1 ext3
> > sata_sil               12869  0
> > ata_piix               13253  0
> > libata                 47901  2 sata_sil,ata_piix
> > megaraid_mbox          37073  0
> > megaraid_mm            17905  1 megaraid_mbox
> > sd_mod                 20545  0
> > scsi_mod              116429  4 aic79xx,libata,megaraid_mbox,sd_mod
>
> > lsmod for client is:
> > Module                  Size  Used by
> > ylock                  17568  2
> > md5                     8001  1
> > ipv6                  241761  28
> > i2c_dev                14273  0
> > i2c_core               25921  1 i2c_dev
> > button                 10449  0
> > battery                12869  0
> > ac                      8773  0
> > joydev                 14209  0
> > uhci_hcd               32729  0
> > ehci_hcd               32069  0
> > tg3                   100933  0
> > dm_snapshot            21093  0
> > dm_zero                 6337  0
> > dm_mirror              31645  0
> > ext3                  118729  4
> > jbd                    59609  1 ext3
> > dm_mod                 60357  12 dm_snapshot,dm_zero,dm_mirror
> > mptscsih                5569  0
> > mptsas                 13389  3 mptscsih
> > mptspi                 13261  1 mptscsih
> > mptfc                  12617  1 mptscsih
> > mptscsi                44125  3 mptsas,mptspi,mptfc
> > mptbase                61345  4 mptsas,mptspi,mptfc,mptscsi
> > sd_mod                 20545  3
> > scsi_mod              117709  5 mptsas,mptspi,mptfc,mptscsi,sd_mod
>
> > sysctl -a | grep tcp for both client and server shows:
> > sunrpc.tcp_slot_table_entries = 16
> > net.ipv4.tcp_bic_beta = 819
> > net.ipv4.tcp_tso_win_divisor = 8
> > net.ipv4.tcp_moderate_rcvbuf = 1
> > net.ipv4.tcp_bic_low_window = 14
> > net.ipv4.tcp_bic_fast_convergence = 1
> > net.ipv4.tcp_bic = 1
> > net.ipv4.tcp_vegas_gamma = 2
> > net.ipv4.tcp_vegas_beta = 6
> > net.ipv4.tcp_vegas_alpha = 2
> > net.ipv4.tcp_vegas_cong_avoid = 0
> > net.ipv4.tcp_westwood = 0
> > net.ipv4.tcp_no_metrics_save = 0
> > net.ipv4.tcp_low_latency = 0
> > net.ipv4.tcp_frto = 0
> > net.ipv4.tcp_tw_reuse = 0
> > net.ipv4.tcp_adv_win_scale = 2
> > net.ipv4.tcp_app_win = 31
> > net.ipv4.tcp_rmem = 8192        873800  8738000
> > net.ipv4.tcp_wmem = 4096        655360  6553600
> > net.ipv4.tcp_mem = 786432       1048576 1572864
> > net.ipv4.tcp_dsack = 1
> > net.ipv4.tcp_ecn = 0
> > net.ipv4.tcp_reordering = 3
> > net.ipv4.tcp_fack = 1
> > net.ipv4.tcp_orphan_retries = 0
> > net.ipv4.tcp_max_syn_backlog = 8192
> > net.ipv4.tcp_rfc1337 = 0
> > net.ipv4.tcp_stdurg = 0
> > net.ipv4.tcp_abort_on_overflow = 0
> > net.ipv4.tcp_tw_recycle = 1
> > net.ipv4.tcp_syncookies = 1
> > net.ipv4.tcp_fin_timeout = 10
> > net.ipv4.tcp_retries2 = 15
> > net.ipv4.tcp_retries1 = 3
> > net.ipv4.tcp_keepalive_intvl = 75
> > net.ipv4.tcp_keepalive_probes = 9
> > net.ipv4.tcp_keepalive_time = 7200
> > net.ipv4.tcp_max_tw_buckets = 180000
> > net.ipv4.tcp_max_orphans = 262144
> > net.ipv4.tcp_synack_retries = 5
> > net.ipv4.tcp_syn_retries = 5
> > net.ipv4.tcp_retrans_collapse = 0
> > net.ipv4.tcp_sack = 1
> > net.ipv4.tcp_window_scaling = 1
> > net.ipv4.tcp_timestamps = 0
> > fs.nfs.nlm_tcpport = 0
>
> > --
> > John Allspaw
> >http://flickr.com/photos/allspaw

Reply via email to