Retrieve Tx timestamp from system BH instead of regular system workqueue.

The current implementation uses schedule_work() which is executed by the
system work queue and kworkers to retrieve Tx timestamps. This increases
latency and can lead to timeouts in case of heavy system load. i210 is
often used in industrial systems, where timestamp timeouts can be fatal.

Therefore, switch to the system BH workqueues which are executed in softirq
context shortly after the IRQ handler returns.

Tested between Intel i210 and i350 with ptp4l gPTP profile:

|ptp4l[30.405]: rms    4 max    7 freq +12825 +/-   3 delay   247 +/-   0
|ptp4l[31.406]: rms    2 max    3 freq +12829 +/-   3 delay   248 +/-   0
|ptp4l[32.406]: rms    3 max    3 freq +12827 +/-   3 delay   248 +/-   0
|ptp4l[33.406]: rms    2 max    3 freq +12827 +/-   3 delay   248 +/-   0
|ptp4l[34.407]: rms    3 max    6 freq +12825 +/-   4 delay   248 +/-   0
|ptp4l[35.407]: rms    3 max    6 freq +12822 +/-   4 delay   246 +/-   0
|ptp4l[36.407]: rms    7 max   10 freq +12812 +/-   5 delay   248 +/-   0
|ptp4l[37.408]: rms    5 max    8 freq +12808 +/-   3 delay   248 +/-   0

Furthermore, Miroslav Lichvar tested with ntpperf and chrony on Intel i350:

Without the patch:

|               |          responses            |        response time (ns)
|rate   clients |  lost invalid   basic  xleave |    min    mean     max stddev
|150000   15000   0.00%   0.00%   0.00% 100.00%    +4188  +36475 +193328  16179
|157500   15750   0.02%   0.00%   0.02%  99.96%    +6373  +42969 +683894  22682
|165375   16384   0.03%   0.00%   0.00%  99.97%    +7911  +43960 +692471  24454
|173643   16384   0.06%   0.00%   0.00%  99.94%    +8323  +45627 +707240  28452
|182325   16384   0.06%   0.00%   0.00%  99.94%    +8404  +47292 +722524  26936
|191441   16384   0.00%   0.00%   0.00% 100.00%    +8930  +51738 +223727  14272
|201013   16384   0.05%   0.00%   0.00%  99.95%    +9634  +53696 +776445  23783
|211063   16384   0.00%   0.00%   0.00% 100.00%   +14393  +54558 +329546  20473
|221616   16384   2.59%   0.00%   0.05%  97.36%   +23924 +321205 +518192  21838
|232696   16384   7.00%   0.00%   0.10%  92.90%   +33396 +337709 +575661  21017
|244330   16384  10.82%   0.00%   0.15%  89.03%   +34188 +340248 +556237  20880
|
|With the patch:
|150000   15000   5.11%   0.00%   0.00%  94.88%    +4426 +460642 +640884  83746
|157500   15750  11.54%   0.00%   0.26%  88.20%   +14434 +543656 +738355  30349
|165375   16384  15.61%   0.00%   0.31%  84.08%   +35822 +515304 +833859  25596
|173643   16384  19.58%   0.00%   0.37%  80.05%   +20762 +568962 +900100  28118
|182325   16384  23.46%   0.00%   0.42%  76.13%   +41829 +547974 +804170  27890
|191441   16384  27.23%   0.00%   0.46%  72.31%   +15182 +557920 +798212  28868
|201013   16384  30.51%   0.00%   0.49%  69.00%   +15980 +560764 +805576  29979
|211063   16384   0.06%   0.00%   0.00%  99.94%   +12668  +80487 +410555  62182
|221616   16384   2.94%   0.00%   0.05%  97.00%   +21587 +342769 +517566  23359
|232696   16384   6.94%   0.00%   0.10%  92.96%   +16581 +336068 +484574  18453
|244330   16384  11.45%   0.00%   0.14%  88.41%   +23608 +345023 +564130  19177

There are some minor differences at lower rates, but no performance
regressions at higher ones.

Reviewed-by: Paul Menzel <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Kurt Kanzenbach <[email protected]>
---
Changes in v5:
- Adjust changelog wording (Aleksandr Loktionov)
- Include measurement numbers in changelog (Paul Menzel)
- Link to v4: 
https://patch.msgid.link/[email protected]

Changes in v4:
- Use BH workqueue (tasklet) instead of doing timestamping in IRQ path (Jakub 
Kicinski)
- Link to v3: 
https://patch.msgid.link/[email protected]

Changes in v3:
- Switch back to IRQ, but for i210 only
- Keep kworker for all other NICs like i350 (Miroslav)
- Link to v2: 
https://lore.kernel.org/r/[email protected]

Changes in v2:
- Switch from IRQ to PTP aux worker due to NTP performance regression (Miroslav)
- Link to v1: 
https://lore.kernel.org/r/[email protected]
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 drivers/net/ethernet/intel/igb/igb_ptp.c  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index ee99fd8fd513..9fd29fedb9f5 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6572,7 +6572,7 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
                        adapter->ptp_tx_skb = skb_get(skb);
                        adapter->ptp_tx_start = jiffies;
                        if (adapter->hw.mac.type == e1000_82576)
-                               schedule_work(&adapter->ptp_tx_work);
+                               queue_work(system_bh_wq, &adapter->ptp_tx_work);
                } else {
                        adapter->tx_hwtstamp_skipped++;
                }
@@ -7076,7 +7076,7 @@ static void igb_tsync_interrupt(struct igb_adapter 
*adapter)
 
        if (tsicr & E1000_TSICR_TXTS) {
                /* retrieve hardware timestamp */
-               schedule_work(&adapter->ptp_tx_work);
+               queue_work(system_bh_wq, &adapter->ptp_tx_work);
        }
 
        if (tsicr & TSINTR_TT0)
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c 
b/drivers/net/ethernet/intel/igb/igb_ptp.c
index bd85d02ecadd..7b44f9090631 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -832,7 +832,7 @@ static void igb_ptp_tx_work(struct work_struct *work)
                igb_ptp_tx_hwtstamp(adapter);
        else
                /* reschedule to check later */
-               schedule_work(&adapter->ptp_tx_work);
+               queue_work(system_bh_wq, &adapter->ptp_tx_work);
 }
 
 static void igb_ptp_overflow_check(struct work_struct *work)

---
base-commit: ad3dfa80be765757f612da04318248f6d20e4f71
change-id: 20250813-igb_irq_ts-1aa77cc7b4cb

Best regards,
--  
Kurt Kanzenbach <[email protected]>

Reply via email to