> -----Original Message-----
> From: Intel-wired-lan <[email protected]> On Behalf Of 
> Krishna Kumar
> Sent: 20 May 2025 22:37
> To: [email protected]
> Cc: [email protected]; [email protected]; Nguyen, Anthony L 
> <[email protected]>; Kitszel, Przemyslaw 
> <[email protected]>; [email protected]; 
> [email protected]; [email protected]; [email protected]; 
> [email protected]; Samudrala, Sridhar <[email protected]>; Zaki, 
> Ahmed <[email protected]>; Kumar, Krishna <[email protected]>
> Subject: [Intel-wired-lan] [PATCH v2 net] net: ice: Perform accurate aRFS 
> flow match
>
> This patch fixes an issue seen in a large-scale deployment under heavy 
> incoming pkts where the aRFS flow wrongly matches a flow and reprograms the 
> NIC with wrong settings. That mis-steering causes RX-path latency spikes and 
> noisy neighbor effects when many connections collide on the same hash (some 
> of our production servers have 20-30K connections).
>
> set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by 
> hashing the skb sized by the per rx-queue table size. This results in 
> multiple connections (even across different rx-queues) getting the same hash 
> value. > The driver steer function modifies the wrong flow to use this 
> rx-queue, e.g.: Flow#1 is first added:
>    Flow#1:  <ip1, port1, ip2, port2>, Hash 'h', q#10
>
> Later when a new flow needs to be added:
>           Flow#2:  <ip3, port3, ip4, port4>, Hash 'h', q#20
>
> The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This 
> results in both flows getting un-optimized - packets for Flow#1 goes to q#20, 
> and then reprogrammed back to q#10 later and so on; and Flow #2 programming 
> is never done as Flow#1 is matched first for all misses. Many flows may 
> wrongly share the same hash and reprogram rules of the original flow each 
> with their own q#.
>
> Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf 
> clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s
72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS, 
enable aRFS (global value is 144 * rps_flow_cnt).
>
> Test notes about results from ice_rx_flow_steer():
> ---------------------------------------------------
> 1. "Skip:" counter increments here:
>    if (fltr_info->q_index == rxq_idx ||
>       arfs_entry->fltr_state != ICE_ARFS_ACTIVE)
>           goto out;
> 2. "Add:" counter increments here:
>    ret = arfs_entry->fltr_info.fltr_id;
>    INIT_HLIST_NODE(&arfs_entry->list_entry);
> 3. "Update:" counter increments here:
>    /* update the queue to forward to on an already existing flow */
>
> Runtime comparison: original code vs with the patch for different 
> rps_flow_cnt values.
>
> +-------------------------------+--------------+--------------+
> | rps_flow_cnt                  |      512     |    2048      |
> +-------------------------------+--------------+--------------+
> | Ratio of Pkts on Good:Bad q's | 214 vs 822K  | 1.1M vs 980K |
> | Avoid wrong aRFS programming  | 0 vs 310K    | 0 vs 30K     |
> | CPU User                      | 216 vs 183   | 216 vs 206   |
> | CPU System                    | 1441 vs 1171 | 1447 vs 1320 |
> | CPU Softirq                   | 1245 vs 920  | 1238 vs 961  |
> | CPU Total                     | 29 vs 22.7   | 29 vs 24.9   |
> | aRFS Update                   | 533K vs 59   | 521K vs 32   |
> | aRFS Skip                     | 82M vs 77M   | 7.2M vs 4.5M |
> +-------------------------------+--------------+--------------+
>
> A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections 
> showed no performance degradation.
>
> Some points on the patch/aRFS behavior:
> 1. Enabling full tuple matching ensures flows are always correctly matched,
>   even with smaller hash sizes.
> 2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs
>   and fewer calls to driver for programming on misses.
> 3. Larger hash tables reduces mis-steering due to more unique flow hashes,
>   but still has clashes. However, with larger per-device rps_flow_cnt, old
>   flows take more time to expire and new aRFS flows cannot be added if h/w
>  limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt
>   pkts have been processed by this cpu that are not part of the flow).
>
> Changes since v1:
>  - Added "Fixes:" tag and documented return values.
>  - Added @ for function parameters.
>  - Updated subject line to denote target tree (net)
>
> Fixes: 28bf26724fdb0 ("ice: Implement aRFS")
> Signed-off-by: Krishna Kumar <[email protected]>
> ---
> drivers/net/ethernet/intel/ice/ice_arfs.c | 49 +++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>

Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)

Reply via email to