I have been running the IMB performance tests and noticed some strange behavior. This is running on a CentOS cluster with 16 processes per node and using the openib btl. Currently, I am looking at the MPI_Barrier performance. Since we make use of a recursive double algorithm (in the tuned collective) I would have expected to see a log2(np) type performance. However, the data is much worse than log2(np) with the trunk being worse than v1.2.4. One interesting piece of data is that I replaced the tuned algorithm with one that is very similar (copied from Sun Clustertools 6) , but instead of each process doing send/recv, we have each one do a send to their lower partners followed by a receive with their upper partners. Then, everything is reversed which finished the barrier. For reasons unknown, this appears to perform better even thought both algorithms should be log2(np).

Another interesting fact is that when run on my really slow cluster over TCP (latency of 150 usec) the tuned barrier algorithm very closely follows the expected log2(np).

I have mentioned this issue to a few people, but thought I would share with a wider audience to see if anyone else has observed MPI_Barrier that is not log2(np). I have attached two pdfs. The first one shows my results and the second one is a picture of the two different barrier algorithms.

Rolf

--

=========================
rolf.vandeva...@sun.com
781-442-3043
=========================

Attachment: imb-barrier-ompi.pdf
Description: Adobe PDF document

Attachment: barrier-tree.pdf
Description: Adobe PDF document

Reply via email to