[Kernel-packages] [Bug 1975649] [NEW] flowtable: fix TCP flow teardown

Bodong Wang Tue, 24 May 2022 16:15:48 -0700

Public bug reported:

* Explain the feature


This patch addresses three possible problems:

1. ct gc may race to undo the timeout adjustment of the packet path, leaving
   the conntrack entry in place with the internal offload timeout (one day).

2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
   timeout is reached before the flow offload del.

3. tcp ct is always set to ESTABLISHED with a very long timeout
   in flow offload teardown/delete even though the state might be already
   CLOSED. Also as a remark we cannot assume that the FIN or RST packet
   is hitting flow table teardown as the packet might get bumped to the
   slow path in nftables.

This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
state transition.

Moreover, return the connection's ownership to conntrack upon teardown
by clearing the offload flag and fixing the established timeout value.
The flow table GC thread will asynchonrnously free the flow table and
hardware offload entries.

Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
which is also misleading since the flow is back to classic conntrack
path.

If nf_ct_delete() removes the entry from the conntrack table, then it
calls nf_ct_put() which decrements the refcnt. This is not a problem
because the flowtable holds a reference to the conntrack object from
flow_offload_alloc() path which is released via flow_offload_free().

This patch also updates nft_flow_offload to skip packets in SYN_RECV
state. Since we might miss or bump packets to slow path, we do not know
what will happen there while we are still in SYN_RECV, this patch
postpones offload up to the next packet which also aligns to the
existing behaviour in tc-ct.

flow_offload_teardown() does not reset the existing tcp state from
flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
path might have already update the state to CLOSE/FIN.

* How to test
Adding the following flows to the OVS bridge in DPU OS:
# ovs-ofctl add-flow ovsbr1 "table=0, ip,ct_state=-trk, actions=ct(table=1)"
# ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+new, 
actions=ct(commit),normal"
# ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=-new, actions=normal"

Start netserver on SUT:
# netserver -p 5007

Start multiple TCP_CRR tests on peer:
# count=1;while [ $count -lt 10 ]; do screen -d -m netperf -t TCP_CRR -H 
11.0.0.2 -l 360  -- -r 1 -O " MIN_LAETENCY, MAX_LATENCY, MEAN_LATENCY, 
P90_LATENCY, P99_LATENCY ,P999_LATENCY,P9999_LATENCY,STDDEV_LATENCY ,THROUGHPUT 
,THROUGHPUT_UNITS "; count=`expr $count + 1`; done
A huge number of connections will be established and tear down. After the 
tests, some of them are not aged out:
# From /proc/net/nf_conntrack in DPU OS
ipv4     2 tcp      6 86354 LAST_ACK src=11.0.0.1 dst=11.0.0.2 sport=35862 
dport=46797 src=11.0.0.2 dst=11.0.0.1 sport=46797 dport=35862 [ASSURED] mark=0 
zone=0 use=2
ipv4     2 tcp      6 86354 LAST_ACK src=11.0.0.1 dst=11.0.0.2 sport=35862 
dport=46797 src=11.0.0.2 dst=11.0.0.1 sport=46797 dport=35862 [ASSURED] mark=0 
zone=0 use=2
ipv4     2 tcp      6 86354 LAST_ACK src=11.0.0.1 dst=11.0.0.2 sport=35862 
dport=46797 src=11.0.0.2 dst=11.0.0.1 sport=46797 dport=35862 [ASSURED] mark=0 
zone=0 use=2
The issue is usually reproduced after running the for several times.

* What it could break.
N/A

** Affects: linux-bluefield (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/1975649

Title:
  flowtable: fix TCP flow teardown

Status in linux-bluefield package in Ubuntu:
  New

Bug description:
  * Explain the feature

  This patch addresses three possible problems:

  1. ct gc may race to undo the timeout adjustment of the packet path, leaving
     the conntrack entry in place with the internal offload timeout (one day).

  2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
     timeout is reached before the flow offload del.

  3. tcp ct is always set to ESTABLISHED with a very long timeout
     in flow offload teardown/delete even though the state might be already
     CLOSED. Also as a remark we cannot assume that the FIN or RST packet
     is hitting flow table teardown as the packet might get bumped to the
     slow path in nftables.

  This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
  conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
  state transition.

  Moreover, return the connection's ownership to conntrack upon teardown
  by clearing the offload flag and fixing the established timeout value.
  The flow table GC thread will asynchonrnously free the flow table and
  hardware offload entries.

  Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
  which is also misleading since the flow is back to classic conntrack
  path.

  If nf_ct_delete() removes the entry from the conntrack table, then it
  calls nf_ct_put() which decrements the refcnt. This is not a problem
  because the flowtable holds a reference to the conntrack object from
  flow_offload_alloc() path which is released via flow_offload_free().

  This patch also updates nft_flow_offload to skip packets in SYN_RECV
  state. Since we might miss or bump packets to slow path, we do not know
  what will happen there while we are still in SYN_RECV, this patch
  postpones offload up to the next packet which also aligns to the
  existing behaviour in tc-ct.

  flow_offload_teardown() does not reset the existing tcp state from
  flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
  path might have already update the state to CLOSE/FIN.

  * How to test
  Adding the following flows to the OVS bridge in DPU OS:
  # ovs-ofctl add-flow ovsbr1 "table=0, ip,ct_state=-trk, actions=ct(table=1)"
  # ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+new, 
actions=ct(commit),normal"
  # ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=-new, actions=normal"

  Start netserver on SUT:
  # netserver -p 5007

  Start multiple TCP_CRR tests on peer:
  # count=1;while [ $count -lt 10 ]; do screen -d -m netperf -t TCP_CRR -H 
11.0.0.2 -l 360  -- -r 1 -O " MIN_LAETENCY, MAX_LATENCY, MEAN_LATENCY, 
P90_LATENCY, P99_LATENCY ,P999_LATENCY,P9999_LATENCY,STDDEV_LATENCY ,THROUGHPUT 
,THROUGHPUT_UNITS "; count=`expr $count + 1`; done
  A huge number of connections will be established and tear down. After the 
tests, some of them are not aged out:
  # From /proc/net/nf_conntrack in DPU OS
  ipv4     2 tcp      6 86354 LAST_ACK src=11.0.0.1 dst=11.0.0.2 sport=35862 
dport=46797 src=11.0.0.2 dst=11.0.0.1 sport=46797 dport=35862 [ASSURED] mark=0 
zone=0 use=2
  ipv4     2 tcp      6 86354 LAST_ACK src=11.0.0.1 dst=11.0.0.2 sport=35862 
dport=46797 src=11.0.0.2 dst=11.0.0.1 sport=46797 dport=35862 [ASSURED] mark=0 
zone=0 use=2
  ipv4     2 tcp      6 86354 LAST_ACK src=11.0.0.1 dst=11.0.0.2 sport=35862 
dport=46797 src=11.0.0.2 dst=11.0.0.1 sport=46797 dport=35862 [ASSURED] mark=0 
zone=0 use=2
  The issue is usually reproduced after running the for several times.

  * What it could break.
  N/A

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1975649/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1975649] [NEW] flowtable: fix TCP flow teardown

Reply via email to