srp: Make HCA completion vector configurable

Sagi Grimberg Mon, 15 Jul 2013 06:30:06 -0700

On 7/15/2013 2:06 PM, Bart Van Assche wrote:

On 14/07/2013 3:43, Sagi Grimberg wrote:
On 7/3/2013 3:58 PM, Bart Van Assche wrote:
Several InfiniBand HCA's allow to configure the completion vector
per queue pair. This allows to spread the workload created by IB
completion interrupts over multiple MSI-X vectors and hence over
multiple CPU cores. In other words, configuring the completion
vector properly not only allows to reduce latency on an initiator
connected to multiple SRP targets but also allows to improve
throughput.
Hey Bart,
Just wrote a small patch to allow srp_daemon spread connection across
HCA's completion vectors.
But re-thinking on this, is it really a good idea to give the user
control over completion
vectors for CQs he doesn't really owns. This way the user must retrieve
the maximum completion
vectors from the ib_device and consider this when adding a connection
and In addition will need to set proper IRQ affinity.

Perhaps the driver can manage this on it's own without involving the
user, take the mlx4_en driver for
example, it spreads it's CQs across HCAs completion vectors without
involving the user. the user that
opens a socket has no influence of the underlying cq<->comp-vector
assignment.

The only use-case I can think of is where the user will want to use only
a subset of the completion-vectors
if the user will want to reserve some completion-vectors for native IB
applications but I don't know
how common it is.

Other from that, I think it is always better to spread the CQs across
HCA completion-vectors, so perhaps the driver
just assign connection CQs across comp-vecs without getting args from
the user, but simply iterate over comp_vectors.

What do you think?
Hello Sagi,
Sorry but I do not think it is a good idea to let srp_daemon assignthe completion vector. While this might work well on single-socketsystems this will result in suboptimal results on NUMA systems. Forcertain workloads on NUMA systems, and when a NUMA initiator system isconnected to multiple target systems, the optimal configuration is tomake sure that all processing that is associated with a single SCSIhost occurs on the same NUMA node. This means configuring thecompletion vector value such that IB interrupts are generated on thesame NUMA node where the associated SCSI host and applications arerunning.
More in general, performance tuning on NUMA systems requiressystem-wide knowledge of all applications that are running and also ofwhich interrupt is processed by which NUMA node. So choosing a propervalue for the completion vector is only possible once the systemtopology and the IRQ affinity masks are known. I don't think we shouldbuild knowledge of all this in srp_daemon.
Bart.


Hey Bart,

Thanks for your quick attention for my question.

srp_daemon is a package designated for the costumer to automaticallydetect targets in the IB fabric. From our expeirience here in Mellanox,costumers/users like automatic "plug&play" tools.They are reluctant to build their own scriptology to enhance performanceand settle with srp_daemon which is preferred over use of ibsrpdm andmanual adding new targets.Regardless, the completion vectors assignment is meaningless withoutsetting proper IRQ affinity, so in the worst case where the user didn'tset his IRQ affinity,this assignment will perform like the default completion vectorassignment as all IRQs are directed without any masking i.e. core 0.

From my expiriments in NUMA systems, optimal performance is gainedwhere all IRQs are directed to half of the cores on the NUMA node closeto the HCA, and all traffic generators share the other half of the coreson the same NUMA node. So based on that knowledge, I thought thatsrp_daemon/srp driver will assign it's CQs across the HCAs completionvectors, and the user is encouraged to set the IRQ affinity as describedabove to gain optimal performance.Adding connections over the far NUMA node don't seem to benefitperformance too much...

As I mentioned, a use-case I see that may raise a problem here, is ifthe user would like to maintain multiple SRP connections and reservesome completion vectors for other IB applications on the system.in this case the user will be able to disable srp_daemon/srp drivercompletion vectors assignment.

So, this was just an idea, and easy implementation that wouldpotentionaly give the user semi-automatic performance optimizedconfiguration...


-Sagi
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

Reply via email to