I was reviewing some rather fetid marketing collateral
about this topic, and finding mostly stuff from 2010ish.
A lot has changed since then: onboard PCIe, CPU speed, inter-socket bus, NUMA sensitivity of the kernel, lots
more cores, mem BW, presumably smarter applications, etc.

Does anyone have comments on recent generations of onload
vs offload interconnect performance? Please don't respond unless it's recent and fully quantified (HW config, how measured, etc).

I'd also be interested to hear from MPI/app people about how useful offload really is (how often can real apps leverage RDMA ops, or the simple sorts of collectives that are offloadable?)

As keeper of probably the oldest living Quadrics system, I appreciate
the appeal of offload.  OTOH, there's no question that onloading puts
a lot of performance potential into the CPU-designer's hands...

thanks, mark hahn.
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 

Reply via email to