If we can
make a case that this truly provides higher throughput/lower
latency than IPoIB or SDP.  Eric/Peter, do you guys have any
measurements that may substantiate this performance assumption?

Performance comparisons are fraught with danger viz.
hardware/firmware/software revisions. You really have to run dedicated
tests on the same hardware before you can compare with confidence.

One thing I haven't mentioned is that LNET has both kernel and userspace implementations. These share the bulk of the network-independent code, but the LND implementations are not shared. Currently we only support TCP/IP and the native Cray XT3 network in userspace. It would be quite easy to add a system call interface to export the kernel LNET API to userspace, but dedicated userspace LND versions would be required to deliver the lowest
latency you'd expect from OS bypass.

                Cheers,
                        Eric

In the MX (Myrinet Express) LND, the MX API itself is identical in kernel-space vs user-space. The differences are in the LND only and would refer to the thread creation and synchronization methods (kernel threads vs pthreads, spinlocks vs pthread_mutex_lock(), etc.). Latency and bandwidth should be equivalent for user vs kernel space with the exception below.

The only performance difference between MX in the kernel and user- space is handling of multiple segments. In the kernel, we made optimizations for Lustre to handle 256 kernel pages, for example. In user-space, MX is not similarly optimized and would try to copy them into a single buffer before sending. This cuts bandwidth by half on 10 Gb/s fabrics. If LNET were pushed into user-space, we would look at providing this optimization.

Scott

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to