Hello folks,

I'm currently trying to learn more about DRM and discovered the
following code sequence:


drivers/gpu/drm/amd/amdkfd/kfd_device.c, Line 824 on 6.4-rc7

static inline void kfd_queue_work(struct workqueue_struct *wq,
                                  struct work_struct *work)
{
        int cpu, new_cpu;

        cpu = new_cpu = smp_processor_id();
        do {
                new_cpu = cpumask_next(new_cpu, cpu_online_mask) % nr_cpu_ids;
                if (cpu_to_node(new_cpu) == numa_node_id())
                        break;
        } while (cpu != new_cpu);

        queue_work_on(new_cpu, wq, work);
}

/* This is called directly from KGD at ISR. */
void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
{
        uint32_t patched_ihre[KFD_MAX_RING_ENTRY_SIZE];
        bool is_patched = false;
        unsigned long flags;

        if (!kfd->init_complete)
                return;

        if (kfd->device_info.ih_ring_entry_size > sizeof(patched_ihre)) {
                dev_err_once(kfd_device, "Ring entry too small\n");
                return;
        }

        spin_lock_irqsave(&kfd->interrupt_lock, flags);

        if (kfd->interrupts_active
            && interrupt_is_wanted(kfd, ih_ring_entry,
                                   patched_ihre, &is_patched)
            && enqueue_ih_ring_entry(kfd,
                                     is_patched ? patched_ihre : ih_ring_entry))
                kfd_queue_work(kfd->ih_wq, &kfd->interrupt_work);

        spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
}


These functions seem to be exclusively invoked by amdgpu_irq_dispatch()
in amdgpu_irq.c
At first glance it seems to me that it's just a typical scenario taking
place here: Interrupt arises, interrupt submits work to wq, then jumps
back to sleep / former process execution context again.

What I don't understand is why it's apparently important to schedule
the work on a particular CPU.

It seems that the do-while in kfd_queue_work() is searching for a CPU
within the same NUMA-Node. Thus I suspect that this is done because
either
a) performance requires it or
b) the work-function needs access to something that's only available 
   within the same node.

I suspect there is an interrupt-related reason why that particular work
should be enqueued on a specific CPU. Just by reading the code alone I
can't really figure out why precisely that's necessary, though.

Does someone have any hints for me? :)

Cheers,
Philipp


Reply via email to