Dear Ben and Roy,

I got the different cost time using "METHOD=pro" and "METHOD=dbg". You can
find the details from the following tables. In "dbg", the problem is always
there. However, in "pro", the problem disapears. Any advice? In this case, I
run the codes for both in slave node. Thanks a lot.

in METHOD=pro:
 
-------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=22.2695, Active
time=14.2938                                                |
 
-------------------------------------------------------------------------------------------------------------
| Event                           nCalls    Total Time  Avg Time    Total
Time  Avg Time    % of Active Time  |
|                                           w/o Sub     w/o Sub     With
Sub    With Sub    w/o S    With S   |
|-------------------------------------------------------------------------------------------------------------|
|
|
|
|
|
DofMap
|
|   add_neighbors_to_send_list()  3         0.0997      0.033244
0.1065      0.035511    0.70     0.75     |
|   build_constraint_matrix()     33576     0.0186      0.000001
0.0186      0.000001    0.13     0.13     |
|   cnstrn_elem_mat_vec()         33576     0.0101      0.000000
0.0101      0.000000    0.07     0.07     |
|   compute_sparsity()            3         0.2933      0.097754
0.4115      0.137154    2.05     2.88     |
|   create_dof_constraints()      3         0.1100      0.036674
0.1637      0.054568    0.77     1.15     |
|   distribute_dofs()             3         0.1971      0.065701
0.6517      0.217241    1.38     4.56     |
|   dof_indices()                 424766    0.4175      0.000001
0.4175      0.000001    2.92     2.92     |
|   enforce_constraints_exactly() 2         0.0055      0.002742
0.0055      0.002742    0.04     0.04     |
|   old_dof_indices()             67152     0.0644      0.000001
0.0644      0.000001    0.45     0.45     |
|   prepare_send_list()           3         0.0015      0.000506
0.0015      0.000506    0.01     0.01     |
|   reinit()                      3         0.3766      0.125545
0.3766      0.125545    2.63     2.63     |
|
|
|
FE
|
|   compute_affine_map()          161783    0.3852      0.000002
0.3852      0.000002    2.70     2.70     |
|   compute_face_map()            64674     0.1622      0.000003
0.1622      0.000003    1.13     1.13     |
|   compute_shape_functions()     161783    0.1218      0.000001
0.1218      0.000001    0.85     0.85     |
|   init_face_shape_functions()   54129     0.2135      0.000004
0.2135      0.000004    1.49     1.49     |
|   init_shape_functions()        115011    0.9611      0.000008
0.9611      0.000008    6.72     6.72     |
|   inverse_map()                 519958    1.4425      0.000003
1.4425      0.000003    10.09    10.09    |
|
|
|
GMVIO
|
|   write_nodal_data()            1         0.1555      0.155485
0.1555      0.155485    1.09     1.09     |
|
|
|
JumpErrorEstimator
|
|   estimate_error()              2         1.0333      0.516627
3.9642      1.982106    7.23     27.73    |
|
|
|
LocationMap
|
|   find()                        50456     0.0286      0.000001
0.0286      0.000001    0.20     0.20     |
|   init()                        4         0.0226      0.005662
0.0226      0.005662    0.16     0.16     |
|
|
|
Mesh
|
|   contract()                    2         0.0185      0.009264
0.0462      0.023103    0.13     0.32     |
|   find_neighbors()              3         0.5844      0.194807
0.6276      0.209214    4.09     4.39     |
|   read()                        1         0.2718      0.271756
0.2718      0.271756    1.90     1.90     |
|   renumber_nodes_and_elem()     8         0.1015      0.012692
0.1015      0.012692    0.71     0.71     |
|
|
|
MeshCommunication
|
|   broadcast_bcs()               1         0.0012      0.001206
0.0330      0.033009    0.01     0.23     |
|   broadcast_mesh()              1         0.0422      0.042237
0.0451      0.045126    0.30     0.32     |
|   compute_hilbert_indices()     4         1.6419      0.410467
1.6419      0.410467    11.49    11.49    |
|   find_global_indices()         4         0.1052      0.026299
1.7979      0.449477    0.74     12.58    |
|   parallel_sort()               4         0.0153      0.003821
0.0470      0.011757    0.11     0.33     |
|
|
|
MeshRefinement
|
|   _coarsen_elements()           4         0.0276      0.006894
0.0278      0.006938    0.19     0.19     |
|   _refine_elements()            4         0.1460      0.036506
0.2486      0.062145    1.02     1.74     |
|   add_point()                   50456     0.0483      0.000001
0.0890      0.000002    0.34     0.62     |
|   make_coarsening_compatible()  5         0.1862      0.037243
0.1862      0.037243    1.30     1.30     |
|   make_refinement_compatible()  5         0.0291      0.005817
0.0309      0.006181    0.20     0.22     |
|
|
|
MetisPartitioner
|
|   partition()                   3         0.3619      0.120617
1.7711      0.590353    2.53     12.39    |
|
|
|
Parallel
|
|   allgather()                   16        0.0735      0.004591
0.0735      0.004591    0.51     0.51     |
|   broadcast()                   13        0.0346      0.002663
0.0346      0.002663    0.24     0.24     |
|   gather()                      3         0.0001      0.000029
0.0001      0.000029    0.00     0.00     |
|   max()                         30        0.0736      0.002454
0.0736      0.002454    0.52     0.52     |
|   min()                         16        0.0107      0.000668
0.0107      0.000668    0.07     0.07     |
|   probe()                       26        0.0213      0.000818
0.0213      0.000818    0.15     0.15     |
|   receive()                     26        0.0033      0.000128
0.0246      0.000947    0.02     0.17     |
|   send()                        26        0.0035      0.000136
0.0035      0.000136    0.02     0.02     |
|   send_receive()                34        0.0004      0.000012
0.0286      0.000842    0.00     0.20     |
|   sum()                         20        0.1321      0.006607
0.1321      0.006607    0.92     0.92     |
|   wait()                        26        0.0000      0.000001
0.0000      0.000001    0.00     0.00     |
|
|
|
Partitioner
|
|   set_node_processor_ids()      3         0.1087      0.036232
0.1282      0.042741    0.76     0.90     |
|   set_parent_processor_ids()    3         0.0313      0.010417
0.0313      0.010417    0.22     0.22     |
|
|
|
PetscLinearSolver
|
|   solve()                       3         3.2520      1.083999
3.2520      1.083999    22.75    22.75    |
|
|
|
ProjectVector
|
|   operator()                    2         0.0847      0.042333
0.1667      0.083344    0.59     1.17     |
|
|
|
System
|
|   assemble()                    3         0.6752      0.225067
1.6206      0.540184    4.72     11.34    |
|   project_vector()              2         0.0870      0.043506
0.3003      0.150132    0.61     2.10     |
 
-------------------------------------------------------------------------------------------------------------
| Totals:                         1737648
14.2938                                         100.00            |
 
-------------------------------------------------------------------------------------------------------------

in METHOD=dbg

 
-------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=970.489, Active
time=958.407                                                |
 
-------------------------------------------------------------------------------------------------------------
| Event                           nCalls    Total Time  Avg Time    Total
Time  Avg Time    % of Active Time  |
|                                           w/o Sub     w/o Sub     With
Sub    With Sub    w/o S    With S   |
|-------------------------------------------------------------------------------------------------------------|
|
|
|
|
|
DofMap
|
|   add_neighbors_to_send_list()  3         0.4857      0.161916
0.5219      0.173970    0.05     0.05     |
|   build_constraint_matrix()     33576     0.1788      0.000005
0.1788      0.000005    0.02     0.02     |
|   cnstrn_elem_mat_vec()         33576     0.1048      0.000003
0.1048      0.000003    0.01     0.01     |
|   compute_sparsity()            3         16.2741     5.424711
16.9691     5.656377    1.70     1.77     |
|   create_dof_constraints()      3         0.4682      0.156067
0.6576      0.219187    0.05     0.07     |
|   distribute_dofs()             3         1.5506      0.516868
3.0141      1.004706    0.16     0.31     |
|   dof_indices()                 424766    2.6192      0.000006
2.6192      0.000006    0.27     0.27     |
|   enforce_constraints_exactly() 2         0.0223      0.011134
0.0223      0.011134    0.00     0.00     |
|   old_dof_indices()             67152     0.4018      0.000006
0.4018      0.000006    0.04     0.04     |
|   prepare_send_list()           3         0.4080      0.135996
0.4080      0.135996    0.04     0.04     |
|   reinit()                      3         1.1455      0.381827
1.1455      0.381827    0.12     0.12     |
|
|
|
FE
|
|   compute_affine_map()          161783    7.5528      0.000047
7.5528      0.000047    0.79     0.79     |
|   compute_face_map()            64674     3.3548      0.000052
3.3548      0.000052    0.35     0.35     |
|   compute_shape_functions()     161783    14.1906     0.000088
14.1906     0.000088    1.48     1.48     |
|   init_face_shape_functions()   54129     1.8255      0.000034
1.8255      0.000034    0.19     0.19     |
|   init_shape_functions()        115011    11.1598     0.000097
11.1598     0.000097    1.16     1.16     |
|   inverse_map()                 519958    3.8413      0.000007
3.8413      0.000007    0.40     0.40     |
|
|
|
GMVIO
|
|   write_nodal_data()            1         0.5128      0.512829
0.5128      0.512829    0.05     0.05     |
|
|
|
JumpErrorEstimator
|
|   estimate_error()              2         5.3186      2.659298
33.5234     16.761681   0.55     3.50     |
|
|
|
LocationMap
|
|   find()                        50456     0.1874      0.000004
0.1874      0.000004    0.02     0.02     |
|   init()                        4         0.1253      0.031314
0.1253      0.031314    0.01     0.01     |
|
|
|
Mesh
|
|   contract()                    2         0.1673      0.083652
0.2817      0.140854    0.02     0.03     |
|   find_neighbors()              3         7.1693      2.389767
7.6922      2.564061    0.75     0.80     |
|   read()                        1         1.5032      1.503193
1.5032      1.503193    0.16     0.16     |
|   renumber_nodes_and_elem()     8         0.4307      0.053843
0.4307      0.053843    0.04     0.04     |
|
|
|
MeshCommunication
|
|   broadcast_bcs()               1         0.0165      0.016476
0.0202      0.020218    0.00     0.00     |
|   broadcast_mesh()              1         0.2634      0.263384
0.2666      0.266577    0.03     0.03     |
|   compute_hilbert_indices()     4         0.9906      0.247642
0.9906      0.247642    0.10     0.10     |
|   find_global_indices()         4         746.7912    186.697788
837.7610    209.440249  77.92    87.41    |
|   parallel_sort()               4         44.3904     11.097589
45.6040     11.400992   4.63     4.76     |
|
|
|
MeshRefinement
|
|   _coarsen_elements()           4         0.1212      0.030298
0.1414      0.035350    0.01     0.01     |
|   _refine_elements()            4         0.5802      0.145040
1.1861      0.296525    0.06     0.12     |
|   add_point()                   50456     0.2761      0.000005
0.5098      0.000010    0.03     0.05     |
|   make_coarsening_compatible()  11        1.9512      0.177386
1.9512      0.177386    0.20     0.20     |
|   make_refinement_compatible()  11        0.3092      0.028110
0.3611      0.032829    0.03     0.04     |
|
|
|
MetisPartitioner
|
|   partition()                   3         2.4345      0.811508
693.1930    231.064329  0.25     72.33    |
|
|
|
Parallel
|
|   allgather()                   16        0.0325      0.002033
0.0325      0.002033    0.00     0.00     |
|   broadcast()                   13        0.0066      0.000511
0.0066      0.000511    0.00     0.00     |
|   gather()                      3         0.0001      0.000037
0.0001      0.000037    0.00     0.00     |
|   max()                         267       0.3244      0.001215
0.3244      0.001215    0.03     0.03     |
|   min()                         467       10.4474     0.022371
10.4474     0.022371    1.09     1.09     |
|   probe()                       26        29.8630     1.148579
29.8630     1.148579    3.12     3.12     |
|   receive()                     26        0.0065      0.000250
29.8696     1.148832    0.00     3.12     |
|   send()                        26        14.6244     0.562477
14.6244     0.562477    1.53     1.53     |
|   send_receive()                34        0.0025      0.000073
44.4968     1.308729    0.00     4.64     |
|   sum()                         20        1.2742      0.063712
1.2742      0.063712    0.13     0.13     |
|   wait()                        26        0.0001      0.000004
0.0001      0.000004    0.00     0.00     |
|
|
|
Partitioner
|
|   set_node_processor_ids()      3         0.9364      0.312139
1.1704      0.390143    0.10     0.12     |
|   set_parent_processor_ids()    3         0.1398      0.046587
0.1398      0.046587    0.01     0.01     |
|
|
|
PetscLinearSolver
|
|   solve()                       3         3.9902      1.330075
3.9911      1.330380    0.42     0.42     |
|
|
|
ProjectVector
|
|   operator()                    2         0.5253      0.262668
0.9876      0.493799    0.05     0.10     |
|
|
|
System
|
|   assemble()                    3         15.0199     5.006632
33.1146     11.038210   1.57     3.46     |
|   project_vector()              2         2.0911      1.045574
3.3312      1.665623    0.22     0.35     |
 
-------------------------------------------------------------------------------------------------------------
| Totals:                         1738348
958.4074                                        100.00            |
 
-------------------------------------------------------------------------------------------------------------

Regards,
Yujie

On Wed, Jan 27, 2010 at 10:42 AM, Kirk, Benjamin (JSC-EG311) <
[email protected]> wrote:

> >> When I sent the following email to libmesh mail list. I met one
> >> problem because of the size of the email. Could you give me some
> >> advice regarding this problem? thanks a lot.
> >
> > It looks like it made it through eventually; just a little late.
>
> I had to approve it based on size, and it was originally sent late US time
> so I didn't get to it until this morning.  This is the second approval I've
> had to make in 24 hours, I'll see if there is
>
> > I'm not sure if you'll get an answer, though.  Ben is the one
> > responsible for find_global_indices, and he's swamped with other
> > things right now.  It does a parallel sort, which can be very
> > sensitive to MPI implementation.
>
> > It only gets used for I/O and the cost should scale more slowly than
> > solves, though; for large implicit 2D/3D problems it shouldn't be an
> > issue even on inefficient MPI implementations.
>
> Yes, this issue is bizarre indeed.  The code does not even do that much
> communication there... You might want to compile with METHOD=pro and run it
> through gprof - that will give you finer grained granularity as to what the
> issue may actually be.
>
> Can you confirm that the problem doesn't exist on one processor?  What are
> the details of the mesh you are using??
>
> -Ben
>
>
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to