Is there something up with our Parallel::max() implementation? In a
recent code I ran on 256 processors, each call to Parallel::max
apparently required 24 seconds, orders of magnitude longer than
something like gather, with presumably way more communication?!
(You may want to view this PerfLog table snippet in fixed-width fonts.)
Parallel
|
| allgather() 8 0.4039 0.050487
0.4286 0.053570 0.00 0.00 |
| broadcast() 251 0.4242 0.001690
0.4242 0.001690 0.00 0.00 |
| gather() 481 0.3723 0.000774
0.3723 0.000774 0.00 0.00 |
| max() 125 3050.9712
24.407770 3050.9712 24.407770 11.78 11.78 |
I search briefly on the devel message list but didn't see this issue
discussed previously.
--
John
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Libmesh-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-devel