On Thu, 4 Apr 2013, Manav Bhatia wrote:
Attached are outputs (with profiling data at the end) from a run
with 288 processors. The case with Parallel mesh is taking about 10
times as long to run.
Thanks for this!
Two things jump out at me:
Ben, why is MeshCommunication::find_global_indices() being run
O(N_proc) times in the Parmetis setup? On big runs that's sure to
leave most processors twiddling their thumbs most of the time.
Manav, your FEMSystem assembly of residuals is 10-15% slower in the
ParallelMesh case (which shouldn't happen, but might just be
measurement noise), but your FEMSystem assembly of jacobians is 6500%
slower, which *definitely* shouldn't happen and isn't measurement
noise. The only thing in FEMSystem::assembly that should be slower in
the ParallelMesh case is the traversal of the mapvector container...
but that exact same traversal is undertaken in both residual and
Jacobian cases; it shouldn't add a fraction of a second in one case
and two minutes in another. So I'm stumped. Would you litter
fem_system.C with some more START_LOG/STOP_LOG calls and see if you
can pinpoint where the Jacobians' slowdown is happening?
---
Roy
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users