On 2024-12-09 6:29 p.m., Joe Damato wrote:
On Mon, Dec 09, 2024 at 05:26:22PM -0700, Ahmed Zaki wrote:
A common task for most drivers is to remember the user's CPU affinity to
its IRQs. On each netdev reset, the driver must then re-assign the
user's setting to the IRQs.

Add CPU affinity mask to napi->config. To delegate the CPU affinity
management to the core, drivers must:
  1 - add a persistent napi config:     netif_napi_add_config()
  2 - bind an IRQ to the napi instance: netif_napi_set_irq()

the core will then make sure to use re-assign affinity to the napi's
IRQ.

The default mask set to all IRQs is all online CPUs.

Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Ahmed Zaki <[email protected]>
---

[...]

diff --git a/net/core/dev.c b/net/core/dev.c
index 6ef9eb401fb2..778ba27d2b83 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6699,11 +6699,35 @@ void netif_queue_set_napi(struct net_device *dev, 
unsigned int queue_index,
  }
  EXPORT_SYMBOL(netif_queue_set_napi);
+static void
+netif_napi_affinity_notify(struct irq_affinity_notify *notify,
+                          const cpumask_t *mask)
+{
+       struct napi_struct *napi =
+               container_of(notify, struct napi_struct, affinity_notify);
+
+       if (napi->config)
+               cpumask_copy(&napi->config->affinity_mask, mask);
+}
+
+static void
+netif_napi_affinity_release(struct kref __always_unused *ref)
+{
+}
+
  static void napi_restore_config(struct napi_struct *n)
  {
        n->defer_hard_irqs = n->config->defer_hard_irqs;
        n->gro_flush_timeout = n->config->gro_flush_timeout;
        n->irq_suspend_timeout = n->config->irq_suspend_timeout;
+
+       if (n->irq > 0 && n->irq_flags & NAPIF_F_IRQ_AFFINITY) {
+               n->affinity_notify.notify = netif_napi_affinity_notify;
+               n->affinity_notify.release = netif_napi_affinity_release;
+               irq_set_affinity_notifier(n->irq, &n->affinity_notify);
+               irq_set_affinity(n->irq, &n->config->affinity_mask);
+       }
+
        /* a NAPI ID might be stored in the config, if so use it. if not, use
         * napi_hash_add to generate one for us. It will be saved to the config
         * in napi_disable.
@@ -6720,6 +6744,8 @@ static void napi_save_config(struct napi_struct *n)
        n->config->gro_flush_timeout = n->gro_flush_timeout;
        n->config->irq_suspend_timeout = n->irq_suspend_timeout;
        n->config->napi_id = n->napi_id;
+       if (n->irq > 0 && n->irq_flags & NAPIF_F_IRQ_AFFINITY)
+               irq_set_affinity_notifier(n->irq, NULL);

My understanding when I attempted this was that using generic IRQ
notifiers breaks ARFS [1], because IRQ notifiers only support a
single notifier and so drivers with ARFS can't _also_ set their own
notifiers for that.

Thank you for the review and reply. I was wondering why some drivers check for ARFS (in buggy ways) before setting affinity notifiers. I now have a better understanding.


Two ideas were proposed in the thread I mentioned:
   1. Have multiple notifiers per IRQ so that having a generic core
      based notifier wouldn't break ARFS.
   2. Jakub mentioned calling cpu_rmap_update from the core so that a
      generic solution wouldn't be blocked.

I don't know anything about option 1, so I looked at option 2.

At the time when I read the code, it seemed that cpu_rmap_update
required some state be passed in (struct irq_glue), so in that case,
the only way to call cpu_rmap_update from the core would be to
maintain some state about ARFS in the core, too, so that drivers
which support ARFS won't be broken by this change.

At that time there was no persistent per-NAPI config, but since
there is now, there might be a way to solve this.

Just guessing here, but maybe one way to solve this would be to move
ARFS into the core by:
   - Adding a new bit in addition to NAPIF_F_IRQ_AFFINITY... I don't
     know NAPIF_F_ARFS_AFFINITY or something? so that drivers
     could express that they support ARFS.
   - Remove the driver calls to irq_cpu_rmap_add and make sure to
     pass the new bit in for drivers that support ARFS (in your
     changeset, I believe that would be at least ice, mlx4, and
     bnxt... possibly more?).
   - In the generic core code, if the ARFS bit is set then you pass
     in the state needed for ARFS to work, otherwise do what the
     proposed code is doing now.

But, that's just a guess. Maybe there's a better way.

I will look into all of this and send a new version, but yes it is clear that the core needs to manage ARFS rmap creation and updates beside the affinity restoration.

Ahmed

Reply via email to