On 8/24/2018 9:17 PM, Sagi Grimberg wrote:
>
>>> nvme-rdma attempts to map queues based on irq vector affinity.
>>> However, for some devices, completion vector irq affinity is
>>> configurable by the user which can break the existing assumption
>>> that irq vectors are optimally arranged over the host cpu cores.
>>
>> IFF affinity is configurable we should never use this code,
>> as it breaks the model entirely. ib_get_vector_affinity should
>> never return a valid mask if affinity is configurable.
>
> I agree that the model intended initially doesn't fit. But it seems
> that some users like to write into their nic's
> /proc/irq/$IRP/smp_affinity and get mad at us for not letting them with
> using managed affinity.
>
> So instead of falling back to the block mapping function we try
> to do a little better first:
> 1. map according to the device vector affinity
> 2. map vectors that end up without a mapping to cpus that belong
> to the same numa-node
> 3. map all the rest of the unmapped cpus like the block layer
> would do.
>
> We could have device drivers that don't use managed affinity to never
> return a valid mask but that would never allow affinity based mapping
> which is optimal at least for users that do not mangle with device
> irq affinity (which is probably the majority of users).
>
> Thoughts?
Can we please make forward progress on this?
Christoph, Sagi: it seems you think /proc/irq/$IRP/smp_affinity
shouldn't be allowed if drivers support managed affinity. Is that correct?
Perhaps that can be codified and be a way forward? IE: Somehow allow
the admin to choose either "managed by the driver/ulps" or "managed by
the system admin directly"?
Or just use Sagi's patch. Perhaps a WARN_ONCE() if the affinity looks
wonked when set via procfs? Just thinking out loud...
But as it stands, things are just plain borked if an rdma driver
supports ib_get_vector_affinity() yet the admin changes the affinity via
/proc...
Thanks,
Steve.