If we can
make a case that this truly provides higher throughput/lower
latency than IPoIB or SDP. Eric/Peter, do you guys have any
measurements that may substantiate this performance assumption?
Performance comparisons are fraught with danger viz.
hardware/firmware/software revisions. You really have to run
dedicated
tests on the same hardware before you can compare with confidence.
One thing I haven't mentioned is that LNET has both kernel and
userspace
implementations. These share the bulk of the network-independent
code, but
the LND implementations are not shared. Currently we only support
TCP/IP
and the native Cray XT3 network in userspace. It would be quite
easy to add
a system call interface to export the kernel LNET API to userspace,
but
dedicated userspace LND versions would be required to deliver the
lowest
latency you'd expect from OS bypass.
Cheers,
Eric
In the MX (Myrinet Express) LND, the MX API itself is identical in
kernel-space vs user-space. The differences are in the LND only and
would refer to the thread creation and synchronization methods
(kernel threads vs pthreads, spinlocks vs pthread_mutex_lock(),
etc.). Latency and bandwidth should be equivalent for user vs kernel
space with the exception below.
The only performance difference between MX in the kernel and user-
space is handling of multiple segments. In the kernel, we made
optimizations for Lustre to handle 256 kernel pages, for example. In
user-space, MX is not similarly optimized and would try to copy them
into a single buffer before sending. This cuts bandwidth by half on
10 Gb/s fabrics. If LNET were pushed into user-space, we would look
at providing this optimization.
Scott
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss