[Beowulf] Re: Beowulf Digest, Vol 37, Issue 58

Håkon Bugge Fri, 30 Mar 2007 14:23:18 -0800

Hi again Christian,

At 16:59 26.03.2007, Christian Bell wrote:

Hi Håkon,
I'm unsure if i would call significant asubmission comparing results betweenconfigurations not compared at scale (inappearance large versus small switch, muchheavier shared-memory component at small processcounts). For example, in your submittedconfigurations, the interconnect communication(inter-node) is never involved more than sharedmemory (intra-node) and when the interconnectdoes become dominant at 32 procs, that's when InfiniPath is faster.

Not sure how you count this. In my "world", allprocesses communicates with more remote processesthat local ones in all cases except for thesingle node runs. I.e., in a two node case with 2or 4 processes per node, a process has 1 or 3other local processes and 2 or 4 other remoteprocesses. Excluding the single node cases, wehave six runs (2x2, 4x2, 8x2, 2x4, 4x4, 8x4)where RDMA is faster than message passing in 5 of the cases.

As to the 32 core case, I am running equal fastas Infinipath on this one, but this is not areleased product (yet). Hence I haven't published it.

And based on this I did not call it significantfindings, but merely an indication of RDMA beingfaster (upto 16 cores) or equal fast as messagepassing for _this_ application and dataset.

On the flip side, you're right that theseresults show the importance of an MPIimplementation (at least for shared memory),which also means your product is well positionedfor the next generation of node configurationsin this regard. However, because of the nodeconfigurations and because this is really onebenchmark, I can't take these results asindicative of general interconnectperformance. Oh, and because you're forcing meto compare results on this table, I now see whatPatrick at Myricom was saying -- the largestconfig you show that stresses the interconnect(8x2x2) takes 596s walltime on a similarMellanox DDR and 452s walltime on InfiniPath SDR(yes, the pipe is "100%" smaller but the performance is 25% better).

Just to avoid any confusion, the 596s number is_not_ with Scali MPI Connect (SMC), but acompeting MPI implementation. SMC achieves 551susing SDR. I must admit your Infinipath number isnew to me, as topcrunch reports 482s for this configuration with Infinipath.

We have performance engineers who gather thistype of data and who've seen these trends onother benchmarks, and they'll be happy to rightany wrong misconceptions, I'm certain.
Now I feel like I'm sticking my tongue out likea shameless vendor and yet my originaldiscussion is not really about beating theInfiniPath drum, which your reply insinuates.

Well, my intent was to draw the wulfers attentionto some published facts containingapples-to-apples comparisons, in an interestingdiscussion of RDMA vs. message passing. Given thesignificant (yes, I mean it) difference inlatency and message rates, I was indeedsurprised. My question still is; if there existedan RDMA API with similar characteristics as thebest message passing APIs, how would a good MPI implementation perform?



Håkon






_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

[Beowulf] Re: Beowulf Digest, Vol 37, Issue 58

Reply via email to