That patch solves the problem, at least for ICMP port unreachable packets. I tested ICMP port unreachable packets without the patch and like ICMP must fragment packets they were not being forwarded with conntrack=1. So it all looks good.
The port unreachable test is a really useful way to test ICMP forwarding. I will be using that in the future! Thankyou for your help Tim On Tue, 11 Sep 2012, Julian Anastasov wrote: > > Hello, > > On Mon, 10 Sep 2012, l...@elwe.co.uk wrote: > >> I have a number of LVS directors running a mixture of CentOS 5 and CentOS >> 6 (running kernels 2.6.18-238.5.1 and 2.6.32-71.29.1). I have applied the >> ipvs-nfct patch to the kernel(s). >> >> When I set /proc/sys/net/ipv4/vs/conntrack to 1 I have PMTU issues. When >> it is set to 0 the issues go away. The issue is when a client on a network >> with a <1500 byte MTU connects. One of my real servers replies to the >> clients request with a 1500 byte packet and a device upstream of the >> client will send an ICMP must fragment. When conntrack=0 the director >> passed the (modified) ICMP packet on to the client. When conntrack=1 the >> director doesn't send an ICMP to the real server. I can toggle conntrack >> and watch the PMTU work and not work. > > I can try to reproduce it with recent kernel. > Can you tell me what forwarding method is used? NAT? Do > you have a test environment, so that you can see what > is shown in logs when IPVS debugging is enabled? > > Do you mean that when conntrack=0 ICMP is forwarded > back to client instead of being forwarded to real server? > > Now I remember for some problems with ICMP: > > - I don't see this change in 2.6.32-71.29.1: > > commit b0aeef30433ea6854e985c2e9842fa19f51b95cc > Author: Julian Anastasov <j...@ssi.bg> > Date: Mon Oct 11 11:23:07 2010 +0300 > > nf_nat: restrict ICMP translation for embedded header > > Skip ICMP translation of embedded protocol header > if NAT bits are not set. Needed for IPVS to see the original > embedded addresses because for IPVS traffic the IPS_SRC_NAT_BIT > and IPS_DST_NAT_BIT bits are not set. It happens when IPVS performs > DNAT for client packets after using nf_conntrack_alter_reply > to expect replies from real server. > > Signed-off-by: Julian Anastasov <j...@ssi.bg> > Signed-off-by: Simon Horman <ho...@verge.net.au> > > diff --git a/net/ipv4/netfilter/nf_nat_core.c > b/net/ipv4/netfilter/nf_nat_core.c > index e2e00c4..0047923 100644 > --- a/net/ipv4/netfilter/nf_nat_core.c > +++ b/net/ipv4/netfilter/nf_nat_core.c > @@ -462,6 +462,18 @@ int nf_nat_icmp_reply_translation(struct nf_conn *ct, > return 0; > } > > + if (manip == IP_NAT_MANIP_SRC) > + statusbit = IPS_SRC_NAT; > + else > + statusbit = IPS_DST_NAT; > + > + /* Invert if this is reply dir. */ > + if (dir == IP_CT_DIR_REPLY) > + statusbit ^= IPS_NAT_MASK; > + > + if (!(ct->status & statusbit)) > + return 1; > + > pr_debug("icmp_reply_translation: translating error %p manip %u " > "dir %s\n", skb, manip, > dir == IP_CT_DIR_ORIGINAL ? "ORIG" : "REPLY"); > @@ -496,20 +508,9 @@ int nf_nat_icmp_reply_translation(struct nf_conn *ct, > > /* Change outer to look the reply to an incoming packet > * (proto 0 means don't invert per-proto part). */ > - if (manip == IP_NAT_MANIP_SRC) > - statusbit = IPS_SRC_NAT; > - else > - statusbit = IPS_DST_NAT; > - > - /* Invert if this is reply dir. */ > - if (dir == IP_CT_DIR_REPLY) > - statusbit ^= IPS_NAT_MASK; > - > - if (ct->status & statusbit) { > - nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple); > - if (!manip_pkt(0, skb, 0, &target, manip)) > - return 0; > - } > + nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple); > + if (!manip_pkt(0, skb, 0, &target, manip)) > + return 0; > > return 1; > } > > If this patch does not help we have to debug it > somehow. > >> I would happily leave conntrack off, but it has a huge performance impact. >> With my traffic profile the softirq load doubles when I turn off >> conntrack. My busiest director is doing 2.1Gb of traffic and with >> conntrack off it can probably only handle 2.5Gb. > > It is interesting to know about such comparison > for conntrack=0 and 1. Can you confirm again both numbers? > 2.1 is not better than 2.5. > >> I am hoping that this issue has been observed and fixed and someone will >> be able to point me to the patch so I can back port it to my kernels (or >> finally get rid of CentOS 5!). >> >> Thanks >> Tim > > Regards > > -- > Julian Anastasov <j...@ssi.bg> > _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org Send requests to lvs-users-requ...@linuxvirtualserver.org or go to http://lists.graemef.net/mailman/listinfo/lvs-users