Hi Gilles,

I confirmed the vader is used when I don't specify any BTL as you pointed
out!

Regards,
Tetsuya Mishima

[mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 --mca
btl_base_verbose 10 -bind-to core -report-bindings osu_bw
[manage.cluster:20006] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:20006] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
[./B/./././.][./././././.]
[manage.cluster:20011] mca: base: components_register: registering
framework btl components
[manage.cluster:20011] mca: base: components_register: found loaded
component self
[manage.cluster:20011] mca: base: components_register: component self
register function successful
[manage.cluster:20011] mca: base: components_register: found loaded
component vader
[manage.cluster:20011] mca: base: components_register: component vader
register function successful
[manage.cluster:20011] mca: base: components_register: found loaded
component tcp
[manage.cluster:20011] mca: base: components_register: component tcp
register function successful
[manage.cluster:20011] mca: base: components_register: found loaded
component sm
[manage.cluster:20011] mca: base: components_register: component sm
register function successful
[manage.cluster:20011] mca: base: components_register: found loaded
component openib
[manage.cluster:20011] mca: base: components_register: component openib
register function successful
[manage.cluster:20011] mca: base: components_open: opening btl components
[manage.cluster:20011] mca: base: components_open: found loaded component
self
[manage.cluster:20011] mca: base: components_open: component self open
function successful
[manage.cluster:20011] mca: base: components_open: found loaded component
vader
[manage.cluster:20011] mca: base: components_open: component vader open
function successful
[manage.cluster:20011] mca: base: components_open: found loaded component
tcp
[manage.cluster:20011] mca: base: components_open: component tcp open
function successful
[manage.cluster:20011] mca: base: components_open: found loaded component
sm
[manage.cluster:20011] mca: base: components_open: component sm open
function successful
[manage.cluster:20011] mca: base: components_open: found loaded component
openib
[manage.cluster:20011] mca: base: components_open: component openib open
function successful
[manage.cluster:20011] select: initializing btl component self
[manage.cluster:20011] select: init of component self returned success
[manage.cluster:20011] select: initializing btl component vader
[manage.cluster:20011] select: init of component vader returned success
[manage.cluster:20011] select: initializing btl component tcp
[manage.cluster:20011] select: init of component tcp returned success
[manage.cluster:20011] select: initializing btl component sm
[manage.cluster:20011] select: init of component sm returned success
[manage.cluster:20011] select: initializing btl component openib
[manage.cluster:20011] Checking distance from this process to device=mthca0
[manage.cluster:20011] hwloc_distances->nbobjs=2
[manage.cluster:20011] hwloc_distances->latency[0]=1.000000
[manage.cluster:20011] hwloc_distances->latency[1]=1.600000
[manage.cluster:20011] hwloc_distances->latency[2]=1.600000
[manage.cluster:20011] hwloc_distances->latency[3]=1.000000
[manage.cluster:20011] ibv_obj->type set to NULL
[manage.cluster:20011] Process is bound: distance to device is 0.000000
[manage.cluster:20012] mca: base: components_register: registering
framework btl components
[manage.cluster:20012] mca: base: components_register: found loaded
component self
[manage.cluster:20012] mca: base: components_register: component self
register function successful
[manage.cluster:20012] mca: base: components_register: found loaded
component vader
[manage.cluster:20012] mca: base: components_register: component vader
register function successful
[manage.cluster:20012] mca: base: components_register: found loaded
component tcp
[manage.cluster:20012] mca: base: components_register: component tcp
register function successful
[manage.cluster:20012] mca: base: components_register: found loaded
component sm
[manage.cluster:20012] mca: base: components_register: component sm
register function successful
[manage.cluster:20012] mca: base: components_register: found loaded
component openib
[manage.cluster:20012] mca: base: components_register: component openib
register function successful
[manage.cluster:20012] mca: base: components_open: opening btl components
[manage.cluster:20012] mca: base: components_open: found loaded component
self
[manage.cluster:20012] mca: base: components_open: component self open
function successful
[manage.cluster:20012] mca: base: components_open: found loaded component
vader
[manage.cluster:20012] mca: base: components_open: component vader open
function successful
[manage.cluster:20012] mca: base: components_open: found loaded component
tcp
[manage.cluster:20012] mca: base: components_open: component tcp open
function successful
[manage.cluster:20012] mca: base: components_open: found loaded component
sm
[manage.cluster:20012] mca: base: components_open: component sm open
function successful
[manage.cluster:20012] mca: base: components_open: found loaded component
openib
[manage.cluster:20012] mca: base: components_open: component openib open
function successful
[manage.cluster:20012] select: initializing btl component self
[manage.cluster:20012] select: init of component self returned success
[manage.cluster:20012] select: initializing btl component vader
[manage.cluster:20012] select: init of component vader returned success
[manage.cluster:20012] select: initializing btl component tcp
[manage.cluster:20012] select: init of component tcp returned success
[manage.cluster:20012] select: initializing btl component sm
[manage.cluster:20012] select: init of component sm returned success
[manage.cluster:20012] select: initializing btl component openib
[manage.cluster:20012] Checking distance from this process to device=mthca0
[manage.cluster:20012] hwloc_distances->nbobjs=2
[manage.cluster:20012] hwloc_distances->latency[0]=1.000000
[manage.cluster:20012] hwloc_distances->latency[1]=1.600000
[manage.cluster:20012] hwloc_distances->latency[2]=1.600000
[manage.cluster:20012] hwloc_distances->latency[3]=1.000000
[manage.cluster:20012] ibv_obj->type set to NULL
[manage.cluster:20012] Process is bound: distance to device is 0.000000
[manage.cluster:20012] openib BTL: rdmacm CPC unavailable for use on
mthca0:1; skipped
[manage.cluster:20011] openib BTL: rdmacm CPC unavailable for use on
mthca0:1; skipped
[manage.cluster:20012] [rank=1] openib: using port mthca0:1
[manage.cluster:20012] select: init of component openib returned success
[manage.cluster:20011] [rank=0] openib: using port mthca0:1
[manage.cluster:20011] select: init of component openib returned success
[manage.cluster:20012] mca: bml: Using self btl for send to [[16477,1],1]
on node manage
[manage.cluster:20011] mca: bml: Using self btl for send to [[16477,1],0]
on node manage
[manage.cluster:20012] mca: bml: Using vader btl for send to [[16477,1],0]
on node manage
[manage.cluster:20011] mca: bml: Using vader btl for send to [[16477,1],1]
on node manage
# OSU MPI Bandwidth Test v3.1.1
# Size        Bandwidth (MB/s)
1                         1.42
2                         3.04
4                         6.06
8                        12.11
16                       24.32
32                       47.78
64                       85.57
128                     139.08
256                     240.59
512                     415.78
1024                    848.47
2048                   1234.08
4096                    265.53
8192                    471.28
16384                   740.52
32768                  1029.48
65536                  1191.29
131072                 1271.51
262144                 1238.58
524288                 1246.67
1048576                1263.01
2097152                1275.67
4194304                1281.87
[manage.cluster:20011] mca: base: close: component self closed
[manage.cluster:20011] mca: base: close: unloading component self
[manage.cluster:20012] mca: base: close: component self closed
[manage.cluster:20012] mca: base: close: unloading component self
[manage.cluster:20012] mca: base: close: component vader closed
[manage.cluster:20012] mca: base: close: unloading component vader
[manage.cluster:20011] mca: base: close: component vader closed
[manage.cluster:20011] mca: base: close: unloading component vader
[manage.cluster:20012] mca: base: close: component tcp closed
[manage.cluster:20012] mca: base: close: unloading component tcp
[manage.cluster:20011] mca: base: close: component tcp closed
[manage.cluster:20011] mca: base: close: unloading component tcp
[manage.cluster:20011] mca: base: close: component sm closed
[manage.cluster:20011] mca: base: close: unloading component sm
[manage.cluster:20012] mca: base: close: component sm closed
[manage.cluster:20012] mca: base: close: unloading component sm
[manage.cluster:20011] mca: base: close: component openib closed
[manage.cluster:20011] mca: base: close: unloading component openib
[manage.cluster:20012] mca: base: close: component openib closed
[manage.cluster:20012] mca: base: close: unloading component openib


2016/07/27 9:23:34、"devel"さんは「Re: [OMPI devel] sm BTL performace of
the openmpi-2.0.0」で書きました
> Also, btl/vader has a higher exclusivity than btl/sm, so if you do not
> manually specify any btl, vader should be used.
>
>
> you can run with
>
> --mca btl_base_verbose 10
>
> to confirm which btl is used
>
>
> Cheers,
>
>
> Gilles
>
>
> On 7/27/2016 9:20 AM, Nathan Hjelm wrote:
> > sm is deprecated in 2.0.0 and will likely be removed in favor of vader
in 2.1.0.
> >
> > This issue is probably this known issue:
https://github.com/open-mpi/ompi-release/pull/1250
> >
> > Please apply those commits and see if it fixes the issue for you.
> >
> > -Nathan
> >
> >> On Jul 26, 2016, at 6:17 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >> Hi Gilles,
> >>
> >> Thanks. I ran again with --mca pml ob1 but I've got the same results
as
> >> below:
> >>
> >> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1
-bind-to
> >> core -report-bindings osu_bw
> >> [manage.cluster:18142] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> >> [B/././././.][./././././.]
> >> [manage.cluster:18142] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> >> [./B/./././.][./././././.]
> >> # OSU MPI Bandwidth Test v3.1.1
> >> # Size        Bandwidth (MB/s)
> >> 1                         1.48
> >> 2                         3.07
> >> 4                         6.26
> >> 8                        12.53
> >> 16                       24.33
> >> 32                       49.03
> >> 64                       83.46
> >> 128                     132.60
> >> 256                     234.96
> >> 512                     420.86
> >> 1024                    842.37
> >> 2048                   1231.65
> >> 4096                    264.67
> >> 8192                    472.16
> >> 16384                   740.42
> >> 32768                  1030.39
> >> 65536                  1191.16
> >> 131072                 1269.45
> >> 262144                 1238.33
> >> 524288                 1247.97
> >> 1048576                1257.96
> >> 2097152                1274.74
> >> 4194304                1280.94
> >> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1
-mca btl
> >> self,sm -bind-to core -report-bindings osu_b
> >> w
> >> [manage.cluster:18204] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> >> [B/././././.][./././././.]
> >> [manage.cluster:18204] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> >> [./B/./././.][./././././.]
> >> # OSU MPI Bandwidth Test v3.1.1
> >> # Size        Bandwidth (MB/s)
> >> 1                         0.52
> >> 2                         1.05
> >> 4                         2.08
> >> 8                         4.18
> >> 16                        8.21
> >> 32                       16.65
> >> 64                       32.60
> >> 128                      66.70
> >> 256                     132.45
> >> 512                     269.27
> >> 1024                    504.63
> >> 2048                    819.76
> >> 4096                    874.54
> >> 8192                   1447.11
> >> 16384                  2263.28
> >> 32768                  3236.85
> >> 65536                  3567.34
> >> 131072                 3555.17
> >> 262144                 3455.76
> >> 524288                 3441.80
> >> 1048576                3505.30
> >> 2097152                3534.01
> >> 4194304                3546.94
> >> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1
-mca btl
> >> self,sm,openib -bind-to core -report-binding
> >> s osu_bw
> >> [manage.cluster:18218] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> >> [B/././././.][./././././.]
> >> [manage.cluster:18218] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> >> [./B/./././.][./././././.]
> >> # OSU MPI Bandwidth Test v3.1.1
> >> # Size        Bandwidth (MB/s)
> >> 1                         0.51
> >> 2                         1.03
> >> 4                         2.05
> >> 8                         4.07
> >> 16                        8.14
> >> 32                       16.32
> >> 64                       32.98
> >> 128                      63.70
> >> 256                     126.66
> >> 512                     252.61
> >> 1024                    480.22
> >> 2048                    810.54
> >> 4096                    290.61
> >> 8192                    512.49
> >> 16384                   764.60
> >> 32768                  1036.81
> >> 65536                  1182.81
> >> 131072                 1264.48
> >> 262144                 1235.82
> >> 524288                 1246.70
> >> 1048576                1254.66
> >> 2097152                1274.64
> >> 4194304                1280.65
> >> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1
-mca btl
> >> self,openib -bind-to core -report-bindings o
> >> su_bw
> >> [manage.cluster:18276] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> >> [B/././././.][./././././.]
> >> [manage.cluster:18276] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> >> [./B/./././.][./././././.]
> >> # OSU MPI Bandwidth Test v3.1.1
> >> # Size        Bandwidth (MB/s)
> >> 1                         0.54
> >> 2                         1.08
> >> 4                         2.18
> >> 8                         4.33
> >> 16                        8.69
> >> 32                       17.39
> >> 64                       34.34
> >> 128                      66.28
> >> 256                     130.36
> >> 512                     241.81
> >> 1024                    429.86
> >> 2048                    553.44
> >> 4096                    707.14
> >> 8192                    879.60
> >> 16384                   763.02
> >> 32768                  1042.89
> >> 65536                  1185.45
> >> 131072                 1267.56
> >> 262144                 1227.41
> >> 524288                 1244.61
> >> 1048576                1255.66
> >> 2097152                1273.55
> >> 4194304                1281.05
> >>
> >>
> >> 2016/07/27 9:02:49、"devel"さんは「Re: [OMPI devel] sm BTL performace
of
> >> the openmpi-2.0.0」で書きました
> >>> Hi,
> >>>
> >>>
> >>> can you please run again with
> >>>
> >>> --mca pml ob1
> >>>
> >>>
> >>> if Open MPI was built with mxm support, pml/cm and mtl/mxm are used
> >>> instead of pml/ob1 and btl/openib
> >>>
> >>>
> >>> Cheers,
> >>>
> >>>
> >>> Gilles
> >>>
> >>>
> >>> On 7/27/2016 8:56 AM, tmish...@jcity.maeda.co.jp wrote:
> >>>> Hi folks,
> >>>>
> >>>> I saw a performance degradation of openmpi-2.0.0 when I ran our
> >> application
> >>>> on a node (12cores). So I did 4 tests using osu_bw as below:
> >>>>
> >>>> 1: mpirun –np 2 osu_bw                           bad(30% of test2)
> >>>> 2: mpirun –np 2 –mca btl self,sm osu_bw          good(same as
> >> openmpi1.10.3)
> >>>> 3: mpirun –np 2 –mca btl self,sm,openib osu_bw   bad(30% of test2)
> >>>> 4: mpirun –np 2 –mca btl self,openib osu_bw      bad(30% of test2)
> >>>>
> >>>> I  guess openib BTL was used in the test 1 and 3, because these
results
> >> are
> >>>> almost  same  as  test  4. I believe that sm BTL should be used even
in
> >> the
> >>>> test 1 and 3, because its priority is higher than openib.
> >> Unfortunately, at
> >>>> the  moment,  I couldn’t figure out the root cause. So please
someone
> >> would
> >>>> take care of it.
> >>>>
> >>>> Regards,
> >>>> Tetsuya Mishima
> >>>>
> >>>> P.S. Here I attached these test results.
> >>>>
> >>>> [mishima@manage   OMB-3.1.1-openmpi2.0.0]$   mpirun  -np  2
-bind-to
> >> core
> >>>> -report-bindings osu_bw
> >>>> [manage.cluster:13389]  MCW  rank  0  bound  to  socket  0[core  0
[hwt
> >> 0]]:
> >>>> [B/././././.][./././././.]
> >>>> [manage.cluster:13389]  MCW  rank  1  bound  to  socket  0[core  1
[hwt
> >> 0]]:
> >>>> [./B/./././.][./././././.]
> >>>> # OSU MPI Bandwidth Test v3.1.1
> >>>> # Size        Bandwidth (MB/s)
> >>>> 1                         1.49
> >>>> 2                         3.04
> >>>> 4                         6.13
> >>>> 8                        12.23
> >>>> 16                       25.01
> >>>> 32                       49.96
> >>>> 64                       87.07
> >>>> 128                     138.87
> >>>> 256                     245.97
> >>>> 512                     423.30
> >>>> 1024                    865.85
> >>>> 2048                   1279.63
> >>>> 4096                    264.79
> >>>> 8192                    473.92
> >>>> 16384                   739.27
> >>>> 32768                  1030.49
> >>>> 65536                  1190.21
> >>>> 131072                 1270.77
> >>>> 262144                 1238.74
> >>>> 524288                 1245.97
> >>>> 1048576                1260.09
> >>>> 2097152                1274.53
> >>>> 4194304                1285.07
> >>>> [mishima@manage  OMB-3.1.1-openmpi2.0.0]$  mpirun  -np  2  -mca btl
> >> self,sm
> >>>> -bind-to core -report-bindings osu_bw
> >>>> [manage.cluster:13448]  MCW  rank  0  bound  to  socket  0[core  0
[hwt
> >> 0]]:
> >>>> [B/././././.][./././././.]
> >>>> [manage.cluster:13448]  MCW  rank  1  bound  to  socket  0[core  1
[hwt
> >> 0]]:
> >>>> [./B/./././.][./././././.]
> >>>> # OSU MPI Bandwidth Test v3.1.1
> >>>> # Size        Bandwidth (MB/s)
> >>>> 1                         0.51
> >>>> 2                         1.01
> >>>> 4                         2.03
> >>>> 8                         4.08
> >>>> 16                        7.92
> >>>> 32                       16.16
> >>>> 64                       32.53
> >>>> 128                      64.30
> >>>> 256                     128.19
> >>>> 512                     256.48
> >>>> 1024                    468.62
> >>>> 2048                    785.29
> >>>> 4096                    854.78
> >>>> 8192                   1404.51
> >>>> 16384                  2249.20
> >>>> 32768                  3136.40
> >>>> 65536                  3495.84
> >>>> 131072                 3436.69
> >>>> 262144                 3392.11
> >>>> 524288                 3400.07
> >>>> 1048576                3460.60
> >>>> 2097152                3488.09
> >>>> 4194304                3498.45
> >>>> [mishima@manage    OMB-3.1.1-openmpi2.0.0]$   mpirun   -np   2
-mca
> >> btl
> >>>> self,sm,openib -bind-to core -report-bindings osu_bw
> >>>> [manage.cluster:13462]  MCW  rank  0  bound  to  socket  0[core  0
[hwt
> >> 0]]:
> >>>> [B/././././.][./././././.]
> >>>> [manage.cluster:13462]  MCW  rank  1  bound  to  socket  0[core  1
[hwt
> >> 0]]:
> >>>> [./B/./././.][./././././.]
> >>>> # OSU MPI Bandwidth Test v3.1.1
> >>>> # Size        Bandwidth (MB/s)
> >>>> 1                         0.54
> >>>> 2                         1.09
> >>>> 4                         2.18
> >>>> 8                         4.37
> >>>> 16                        8.75
> >>>> 32                       17.37
> >>>> 64                       34.67
> >>>> 128                      66.66
> >>>> 256                     132.55
> >>>> 512                     261.52
> >>>> 1024                    489.51
> >>>> 2048                    818.38
> >>>> 4096                    290.48
> >>>> 8192                    511.64
> >>>> 16384                   765.24
> >>>> 32768                  1043.28
> >>>> 65536                  1180.48
> >>>> 131072                 1261.41
> >>>> 262144                 1232.86
> >>>> 524288                 1245.70
> >>>> 1048576                1245.69
> >>>> 2097152                1268.67
> >>>> 4194304                1281.33
> >>>> [mishima@manage  OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca btl
> >> self,openib
> >>>> -bind-to core -report-bindings osu_bw
> >>>> [manage.cluster:13521]  MCW  rank  0  bound  to  socket  0[core  0
[hwt
> >> 0]]:
> >>>> [B/././././.][./././././.]
> >>>> [manage.cluster:13521]  MCW  rank  1  bound  to  socket  0[core  1
[hwt
> >> 0]]:
> >>>> [./B/./././.][./././././.]
> >>>> # OSU MPI Bandwidth Test v3.1.1
> >>>> # Size        Bandwidth (MB/s)
> >>>> 1                         0.54
> >>>> 2                         1.08
> >>>> 4                         2.16
> >>>> 8                         4.34
> >>>> 16                        8.64
> >>>> 32                       17.25
> >>>> 64                       34.30
> >>>> 128                      66.13
> >>>> 256                     129.99
> >>>> 512                     242.26
> >>>> 1024                    429.24
> >>>> 2048                    556.00
> >>>> 4096                    706.80
> >>>> 8192                    874.35
> >>>> 16384                   762.60
> >>>> 32768                  1039.61
> >>>> 65536                  1184.03
> >>>> 131072                 1267.09
> >>>> 262144                 1230.76
> >>>> 524288                 1246.92
> >>>> 1048576                1255.88
> >>>> 2097152                1274.54
> >>>> 4194304
> >> 1281.63
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> de...@open-mpi.org
> >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>> Link to this post:
> >> http://www.open-mpi.org/community/lists/devel/2016/07/19288.php
> >>> _______________________________________________
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/develLink
to
> >> this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19289.php
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19290.php
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19291.php
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/develLink to
this post: http://www.open-mpi.org/community/lists/devel/2016/07/19292.php

Reply via email to