Is there something up with our Parallel::max() implementation?  In a
recent code I ran on 256 processors, each call to Parallel::max
apparently required 24 seconds, orders of magnitude longer than
something like gather, with presumably way more communication?!

(You may want to view this PerfLog table snippet in fixed-width fonts.)

Parallel
                                           |
|   allgather()                         8         0.4039      0.050487
   0.4286      0.053570    0.00     0.00     |
|   broadcast()                         251       0.4242      0.001690
   0.4242      0.001690    0.00     0.00     |
|   gather()                            481       0.3723      0.000774
   0.3723      0.000774    0.00     0.00     |
|   max()                               125       3050.9712
24.407770   3050.9712   24.407770   11.78    11.78    |


I search briefly on the devel message list but didn't see this issue
discussed previously.

-- 
John

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Libmesh-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to