Dear Ben and Roy, I got the different cost time using "METHOD=pro" and "METHOD=dbg". You can find the details from the following tables. In "dbg", the problem is always there. However, in "pro", the problem disapears. Any advice? In this case, I run the codes for both in slave node. Thanks a lot.
in METHOD=pro: ------------------------------------------------------------------------------------------------------------- | libMesh Performance: Alive time=22.2695, Active time=14.2938 | ------------------------------------------------------------------------------------------------------------- | Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time | | w/o Sub w/o Sub With Sub With Sub w/o S With S | |-------------------------------------------------------------------------------------------------------------| | | | | | DofMap | | add_neighbors_to_send_list() 3 0.0997 0.033244 0.1065 0.035511 0.70 0.75 | | build_constraint_matrix() 33576 0.0186 0.000001 0.0186 0.000001 0.13 0.13 | | cnstrn_elem_mat_vec() 33576 0.0101 0.000000 0.0101 0.000000 0.07 0.07 | | compute_sparsity() 3 0.2933 0.097754 0.4115 0.137154 2.05 2.88 | | create_dof_constraints() 3 0.1100 0.036674 0.1637 0.054568 0.77 1.15 | | distribute_dofs() 3 0.1971 0.065701 0.6517 0.217241 1.38 4.56 | | dof_indices() 424766 0.4175 0.000001 0.4175 0.000001 2.92 2.92 | | enforce_constraints_exactly() 2 0.0055 0.002742 0.0055 0.002742 0.04 0.04 | | old_dof_indices() 67152 0.0644 0.000001 0.0644 0.000001 0.45 0.45 | | prepare_send_list() 3 0.0015 0.000506 0.0015 0.000506 0.01 0.01 | | reinit() 3 0.3766 0.125545 0.3766 0.125545 2.63 2.63 | | | | FE | | compute_affine_map() 161783 0.3852 0.000002 0.3852 0.000002 2.70 2.70 | | compute_face_map() 64674 0.1622 0.000003 0.1622 0.000003 1.13 1.13 | | compute_shape_functions() 161783 0.1218 0.000001 0.1218 0.000001 0.85 0.85 | | init_face_shape_functions() 54129 0.2135 0.000004 0.2135 0.000004 1.49 1.49 | | init_shape_functions() 115011 0.9611 0.000008 0.9611 0.000008 6.72 6.72 | | inverse_map() 519958 1.4425 0.000003 1.4425 0.000003 10.09 10.09 | | | | GMVIO | | write_nodal_data() 1 0.1555 0.155485 0.1555 0.155485 1.09 1.09 | | | | JumpErrorEstimator | | estimate_error() 2 1.0333 0.516627 3.9642 1.982106 7.23 27.73 | | | | LocationMap | | find() 50456 0.0286 0.000001 0.0286 0.000001 0.20 0.20 | | init() 4 0.0226 0.005662 0.0226 0.005662 0.16 0.16 | | | | Mesh | | contract() 2 0.0185 0.009264 0.0462 0.023103 0.13 0.32 | | find_neighbors() 3 0.5844 0.194807 0.6276 0.209214 4.09 4.39 | | read() 1 0.2718 0.271756 0.2718 0.271756 1.90 1.90 | | renumber_nodes_and_elem() 8 0.1015 0.012692 0.1015 0.012692 0.71 0.71 | | | | MeshCommunication | | broadcast_bcs() 1 0.0012 0.001206 0.0330 0.033009 0.01 0.23 | | broadcast_mesh() 1 0.0422 0.042237 0.0451 0.045126 0.30 0.32 | | compute_hilbert_indices() 4 1.6419 0.410467 1.6419 0.410467 11.49 11.49 | | find_global_indices() 4 0.1052 0.026299 1.7979 0.449477 0.74 12.58 | | parallel_sort() 4 0.0153 0.003821 0.0470 0.011757 0.11 0.33 | | | | MeshRefinement | | _coarsen_elements() 4 0.0276 0.006894 0.0278 0.006938 0.19 0.19 | | _refine_elements() 4 0.1460 0.036506 0.2486 0.062145 1.02 1.74 | | add_point() 50456 0.0483 0.000001 0.0890 0.000002 0.34 0.62 | | make_coarsening_compatible() 5 0.1862 0.037243 0.1862 0.037243 1.30 1.30 | | make_refinement_compatible() 5 0.0291 0.005817 0.0309 0.006181 0.20 0.22 | | | | MetisPartitioner | | partition() 3 0.3619 0.120617 1.7711 0.590353 2.53 12.39 | | | | Parallel | | allgather() 16 0.0735 0.004591 0.0735 0.004591 0.51 0.51 | | broadcast() 13 0.0346 0.002663 0.0346 0.002663 0.24 0.24 | | gather() 3 0.0001 0.000029 0.0001 0.000029 0.00 0.00 | | max() 30 0.0736 0.002454 0.0736 0.002454 0.52 0.52 | | min() 16 0.0107 0.000668 0.0107 0.000668 0.07 0.07 | | probe() 26 0.0213 0.000818 0.0213 0.000818 0.15 0.15 | | receive() 26 0.0033 0.000128 0.0246 0.000947 0.02 0.17 | | send() 26 0.0035 0.000136 0.0035 0.000136 0.02 0.02 | | send_receive() 34 0.0004 0.000012 0.0286 0.000842 0.00 0.20 | | sum() 20 0.1321 0.006607 0.1321 0.006607 0.92 0.92 | | wait() 26 0.0000 0.000001 0.0000 0.000001 0.00 0.00 | | | | Partitioner | | set_node_processor_ids() 3 0.1087 0.036232 0.1282 0.042741 0.76 0.90 | | set_parent_processor_ids() 3 0.0313 0.010417 0.0313 0.010417 0.22 0.22 | | | | PetscLinearSolver | | solve() 3 3.2520 1.083999 3.2520 1.083999 22.75 22.75 | | | | ProjectVector | | operator() 2 0.0847 0.042333 0.1667 0.083344 0.59 1.17 | | | | System | | assemble() 3 0.6752 0.225067 1.6206 0.540184 4.72 11.34 | | project_vector() 2 0.0870 0.043506 0.3003 0.150132 0.61 2.10 | ------------------------------------------------------------------------------------------------------------- | Totals: 1737648 14.2938 100.00 | ------------------------------------------------------------------------------------------------------------- in METHOD=dbg ------------------------------------------------------------------------------------------------------------- | libMesh Performance: Alive time=970.489, Active time=958.407 | ------------------------------------------------------------------------------------------------------------- | Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time | | w/o Sub w/o Sub With Sub With Sub w/o S With S | |-------------------------------------------------------------------------------------------------------------| | | | | | DofMap | | add_neighbors_to_send_list() 3 0.4857 0.161916 0.5219 0.173970 0.05 0.05 | | build_constraint_matrix() 33576 0.1788 0.000005 0.1788 0.000005 0.02 0.02 | | cnstrn_elem_mat_vec() 33576 0.1048 0.000003 0.1048 0.000003 0.01 0.01 | | compute_sparsity() 3 16.2741 5.424711 16.9691 5.656377 1.70 1.77 | | create_dof_constraints() 3 0.4682 0.156067 0.6576 0.219187 0.05 0.07 | | distribute_dofs() 3 1.5506 0.516868 3.0141 1.004706 0.16 0.31 | | dof_indices() 424766 2.6192 0.000006 2.6192 0.000006 0.27 0.27 | | enforce_constraints_exactly() 2 0.0223 0.011134 0.0223 0.011134 0.00 0.00 | | old_dof_indices() 67152 0.4018 0.000006 0.4018 0.000006 0.04 0.04 | | prepare_send_list() 3 0.4080 0.135996 0.4080 0.135996 0.04 0.04 | | reinit() 3 1.1455 0.381827 1.1455 0.381827 0.12 0.12 | | | | FE | | compute_affine_map() 161783 7.5528 0.000047 7.5528 0.000047 0.79 0.79 | | compute_face_map() 64674 3.3548 0.000052 3.3548 0.000052 0.35 0.35 | | compute_shape_functions() 161783 14.1906 0.000088 14.1906 0.000088 1.48 1.48 | | init_face_shape_functions() 54129 1.8255 0.000034 1.8255 0.000034 0.19 0.19 | | init_shape_functions() 115011 11.1598 0.000097 11.1598 0.000097 1.16 1.16 | | inverse_map() 519958 3.8413 0.000007 3.8413 0.000007 0.40 0.40 | | | | GMVIO | | write_nodal_data() 1 0.5128 0.512829 0.5128 0.512829 0.05 0.05 | | | | JumpErrorEstimator | | estimate_error() 2 5.3186 2.659298 33.5234 16.761681 0.55 3.50 | | | | LocationMap | | find() 50456 0.1874 0.000004 0.1874 0.000004 0.02 0.02 | | init() 4 0.1253 0.031314 0.1253 0.031314 0.01 0.01 | | | | Mesh | | contract() 2 0.1673 0.083652 0.2817 0.140854 0.02 0.03 | | find_neighbors() 3 7.1693 2.389767 7.6922 2.564061 0.75 0.80 | | read() 1 1.5032 1.503193 1.5032 1.503193 0.16 0.16 | | renumber_nodes_and_elem() 8 0.4307 0.053843 0.4307 0.053843 0.04 0.04 | | | | MeshCommunication | | broadcast_bcs() 1 0.0165 0.016476 0.0202 0.020218 0.00 0.00 | | broadcast_mesh() 1 0.2634 0.263384 0.2666 0.266577 0.03 0.03 | | compute_hilbert_indices() 4 0.9906 0.247642 0.9906 0.247642 0.10 0.10 | | find_global_indices() 4 746.7912 186.697788 837.7610 209.440249 77.92 87.41 | | parallel_sort() 4 44.3904 11.097589 45.6040 11.400992 4.63 4.76 | | | | MeshRefinement | | _coarsen_elements() 4 0.1212 0.030298 0.1414 0.035350 0.01 0.01 | | _refine_elements() 4 0.5802 0.145040 1.1861 0.296525 0.06 0.12 | | add_point() 50456 0.2761 0.000005 0.5098 0.000010 0.03 0.05 | | make_coarsening_compatible() 11 1.9512 0.177386 1.9512 0.177386 0.20 0.20 | | make_refinement_compatible() 11 0.3092 0.028110 0.3611 0.032829 0.03 0.04 | | | | MetisPartitioner | | partition() 3 2.4345 0.811508 693.1930 231.064329 0.25 72.33 | | | | Parallel | | allgather() 16 0.0325 0.002033 0.0325 0.002033 0.00 0.00 | | broadcast() 13 0.0066 0.000511 0.0066 0.000511 0.00 0.00 | | gather() 3 0.0001 0.000037 0.0001 0.000037 0.00 0.00 | | max() 267 0.3244 0.001215 0.3244 0.001215 0.03 0.03 | | min() 467 10.4474 0.022371 10.4474 0.022371 1.09 1.09 | | probe() 26 29.8630 1.148579 29.8630 1.148579 3.12 3.12 | | receive() 26 0.0065 0.000250 29.8696 1.148832 0.00 3.12 | | send() 26 14.6244 0.562477 14.6244 0.562477 1.53 1.53 | | send_receive() 34 0.0025 0.000073 44.4968 1.308729 0.00 4.64 | | sum() 20 1.2742 0.063712 1.2742 0.063712 0.13 0.13 | | wait() 26 0.0001 0.000004 0.0001 0.000004 0.00 0.00 | | | | Partitioner | | set_node_processor_ids() 3 0.9364 0.312139 1.1704 0.390143 0.10 0.12 | | set_parent_processor_ids() 3 0.1398 0.046587 0.1398 0.046587 0.01 0.01 | | | | PetscLinearSolver | | solve() 3 3.9902 1.330075 3.9911 1.330380 0.42 0.42 | | | | ProjectVector | | operator() 2 0.5253 0.262668 0.9876 0.493799 0.05 0.10 | | | | System | | assemble() 3 15.0199 5.006632 33.1146 11.038210 1.57 3.46 | | project_vector() 2 2.0911 1.045574 3.3312 1.665623 0.22 0.35 | ------------------------------------------------------------------------------------------------------------- | Totals: 1738348 958.4074 100.00 | ------------------------------------------------------------------------------------------------------------- Regards, Yujie On Wed, Jan 27, 2010 at 10:42 AM, Kirk, Benjamin (JSC-EG311) < [email protected]> wrote: > >> When I sent the following email to libmesh mail list. I met one > >> problem because of the size of the email. Could you give me some > >> advice regarding this problem? thanks a lot. > > > > It looks like it made it through eventually; just a little late. > > I had to approve it based on size, and it was originally sent late US time > so I didn't get to it until this morning. This is the second approval I've > had to make in 24 hours, I'll see if there is > > > I'm not sure if you'll get an answer, though. Ben is the one > > responsible for find_global_indices, and he's swamped with other > > things right now. It does a parallel sort, which can be very > > sensitive to MPI implementation. > > > It only gets used for I/O and the cost should scale more slowly than > > solves, though; for large implicit 2D/3D problems it shouldn't be an > > issue even on inefficient MPI implementations. > > Yes, this issue is bizarre indeed. The code does not even do that much > communication there... You might want to compile with METHOD=pro and run it > through gprof - that will give you finer grained granularity as to what the > issue may actually be. > > Can you confirm that the problem doesn't exist on one processor? What are > the details of the mesh you are using?? > > -Ben > > ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Libmesh-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/libmesh-users
