[Beowulf] Re: Beowulf Digest, Vol 37, Issue 58

Håkon Bugge Thu, 12 Apr 2007 10:35:35 -0700

Hi Christian,

Sorry for this very delayed answer.


At 03:16 27.03.2007, Christian Bell wrote:

I can't type, 482 was indeed a typo.  But still, I wouldn't look at
the absolute numbers "as is" since the single-node base case has
different performance.  Since 1x2x1 is our only common base case and
since Scali is faster at 4212 versus 4863, the IB interconect you're
testing should be achieving 416s instead of 550s to produce strong
scaling similar in line with the 8x2x2 InfiniPath time to solution
(at 482s).

Well, you do know Amdahl vs. Gustavson, right?The dataset is fixed, elapsed time includesinitialization, write of animation files andmore. Hence, slower per node performance would

_scale_ better.

For this application field, crash worthinesstesting, most users keep the number of coresconstant throughout the duration of a project (12- 18 mnths). This due to numerical stability andverification thereof. Hence, the interestingpoint is not how far and fast you could run, butthe cost of the system capable of running theapplication instances at 60-80% parallel efficiency.

As to the RMDA vs. MP based interconnectsemantics, the problem I am phasing is that theRDMA interconnect I am using is more or lesscollapsing using 32 cores. Using alltoall with 1kpacket size, it actually perform worse than Gbe.Sigh! (And please, do not turn this into a vendorharassment, as I am pretty sure this has to dowith implementation and not architecture). So,what I have shown is that an RDMA interconnectperforms faster than a message passinginterconnect which has roughly 3x lower latencyand 20x (?) higher message rate upto a scalingpoint where the RDMA _implementation_ collapses.And this _despite_ the fact the RDMA based MPIhas to perform the MPI message matching.

With equal metrics/performance and phrased in this manner, it seems
that RDMA still has to implement the semantics that message-passing
already provides, which suggests in this case that the RDMA interface
is at a loss.  Maybe I'm missing something to your question...

I doubt you're missing anything;-) But let mestress that as the number of cores per nodescale, a message passing semantics HCA withmessage matching in the HCA will have a constantmessage matching rate. An RDMA based MPI whichuses the cores for message matching, the messagematching rate would be almost proportional to the number of cores...


Håkon



--
Håkon Bugge
CTO
dir. +47 22 62 89 72
mob. +47 92 48 45 14
fax. +47 22 62 89 51
[EMAIL PROTECTED]
Skype: hakon_bugge

Scali - http://www.scali.com
Scaling the Linux Datacenter


_______________________________________________
Beowulf mailing list, [EMAIL PROTECTED]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] Re: Beowulf Digest, Vol 37, Issue 58

Reply via email to