Matt,
For all 1.2.X versions you should use btl_openib_ib_pkey_val
In ongoing 1.3 version the parameter was renamed to btl_openib_of_pkey_val.

BTW we plan to release 1.2.8 version very soon and it will include the partition bug fix.

Regards,
Pasha

Matt Burgess wrote:
Pasha,

With your patch and parameter suggestion, it works! So to be clear btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for 1.2.7?

Thanks again,
Matt

On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il <mailto:pa...@dev.mellanox.co.il>> wrote:

    Matt,
    Can you please run " cat
    /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib.
    I would like to check the partition configuration.

    Ohh, BTW I see that the command line in previous email was wrong,
    Please use follow command line (the parameter name should be
    "btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts
    HEX/DEC values):
    /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
    openib,self -mca btl_openib_ib_pkey_val 0x8109
    /cluster/pallas/x86_64-ib/IMB-MPI1

    Ompi 1.2.6 version should work ok with this patch.


    Thanks,
    Pasha

    Matt Burgess wrote:

        Pasha,

        Thanks for the patch. Unfortunately, it doesn't seem like that
        fixed the problem. I realized earlier I didn't mention what
        version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.>
        <http://1.2.6.> Should I be trying 1.2.7 with this patch?

        Thanks,
        Matt

        2008/10/7 Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il
        <mailto:pa...@dev.mellanox.co.il>
        <mailto:pa...@dev.mellanox.co.il
        <mailto:pa...@dev.mellanox.co.il>>>


           Matt,
           Can you please try attached patch ? I guess it will resolve
        this
           issue.

           Thanks,
           Pasha

           Matt Burgess wrote:

               Lenny,

               Thanks for the info. It doesn't seem to be be working
        still.
               My command line is:

               /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib
        -mca btl
               openib,self -mca btl_openib_of_pkey_val 33033
               /cluster/pallas/x86_64-ib/IMB-MPI1

               I don't have a
        "/sys/class/infiniband/mthca0/ports/1/pkeys/"
               but I do have
        "/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
               It's contents are:

0 106 114 122 16 24 32 40 49 57 65 73 81
                 9    98
1 107 115 123 17 25 33 41 5 58 66 74 82
                 90   99
10 108 116 124 18 26 34 42 50 59 67 75 83 91 100 109 117 125 19 27 35 43 51 6 68 76 84 92 101 11 118 126 2 28 36 44 52 60 69 77 85 93 102 110 119 127 20 29 37 45 53 61 7 78 86 94 103 111 12 13 21 3 38 46 54 62 70 79 87 95 104 112 120 14 22 30 39 47 55 63 71 8 88 96 105
         113  121  15
                 23   31   4    48   56   64   72   80   89   97
               We aren't using the opensm, but voltaire's SM on a 2012
        switch.

               Thanks again,
               Matt


               On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
               <lenny.verkhov...@gmail.com
        <mailto:lenny.verkhov...@gmail.com>
               <mailto:lenny.verkhov...@gmail.com
        <mailto:lenny.verkhov...@gmail.com>>
               <mailto:lenny.verkhov...@gmail.com
        <mailto:lenny.verkhov...@gmail.com>
               <mailto:lenny.verkhov...@gmail.com
        <mailto:lenny.verkhov...@gmail.com>>>> wrote:

                  Hi Matt,

                  It seems that the right way to do it is the fallowing:

                  -mca btl openib,self -mca btl_openib_ib_pkey_val 33033

                  when the value is a decimal number of the pkey, in
        your case
                  0x8109 = 33033, and no need for
        btl_openib_ib_pkey_ix value.

                  ex.
                  mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
                  btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
                  LT (2) (size min max avg) 1 3.511429 3.511429 3.511429

                  if it's not working check cat
                  /sys/class/infiniband/mthca0/ports/1/pkeys/* for
        pkeys ans SM,
                  maybe it's a setup.

                  Pasha is currently checking this issue.

                  Best regards,

                  Lenny.





                  On 10/7/08, *Jeff Squyres* <jsquy...@cisco.com
        <mailto:jsquy...@cisco.com>
               <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
                  <mailto:jsquy...@cisco.com
        <mailto:jsquy...@cisco.com> <mailto:jsquy...@cisco.com
        <mailto:jsquy...@cisco.com>>>> wrote:

                      FWIW, if this configuration is for all of your
        users, you
                      might want to specify these MCA params in the
        default MCA
                      param file, or the environment, ...etc.  Just so
        that you
                      don't have to specify it on every mpirun command
        line.

                      See
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params.



                      On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote:

                          Sorry, misunderstood the question,

                          thanks for Pasha the right command line will be

                          -mca btl openib,self -mca
        btl_openib_of_pkey_val 0x8109
                          -mca btl_openib_of_pkey_ix 1

                          ex.

                          #mpirun -np 2 -H witch2,witch3 -mca btl
        openib,self
               -mca
                          btl_openib_of_pkey_val 0x8001 -mca
               btl_openib_of_pkey_ix 1
                          ./mpi_p1_4_TRUNK -t lt
                          LT (2) (size min max avg) 1 3.443480
        3.443480 3.443480


                          Best regards

                          Lenny.


                          On 10/6/08, Jeff Squyres <jsquy...@cisco.com
        <mailto:jsquy...@cisco.com>
               <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
                          <mailto:jsquy...@cisco.com
        <mailto:jsquy...@cisco.com>

               <mailto:jsquy...@cisco.com
        <mailto:jsquy...@cisco.com>>>> wrote: On Oct 5, 2008, at

                          1:22 PM, Lenny Verkhovsky wrote:

                          you should probably use -mca tcp,self  -mca
                          btl_openib_if_include ib0.8109


                          Really?  I thought we only took OpenFabrics
        device
               names
                          in the openib_if_include MCA param...?  It
        looks like
                          ib0.8109 is an IPoIB device name.



                          Lenny.



                          On 10/3/08, Matt Burgess
        <burgess.m...@gmail.com <mailto:burgess.m...@gmail.com>
               <mailto:burgess.m...@gmail.com
        <mailto:burgess.m...@gmail.com>>
                          <mailto:burgess.m...@gmail.com
        <mailto:burgess.m...@gmail.com>
               <mailto:burgess.m...@gmail.com
        <mailto:burgess.m...@gmail.com>>>> wrote:
                          Hi,


                          I'm trying to get openmpi working over openib
               partitions.
                          On this cluster, the partition number is
        0x109. The ib
                          interfaces are pingable over the appropriate
        ib0.8109
                          interface:

                          d2:/opt/openmpi-ib # ifconfig ib0.8109
                          ib0.8109  Link encap:UNSPEC  HWaddr
                          80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00
                                  inet addr:10.21.48.2
        <http://10.21.48.2> <http://10.21.48.2>
               <http://10.21.48.2>
                           Bcast:10.21.255.255 <http://10.21.255.255>
        <http://10.21.255.255>
               <http://10.21.255.255>
                           Mask:255.255.0.0 <http://255.255.0.0>
        <http://255.255.0.0>
               <http://255.255.0.0>

                                  inet6 addr: fe80::202:c902:26:ca01/64
               Scope:Link
                                  UP BROADCAST RUNNING MULTICAST
         MTU:65520
                Metric:1
                                  RX packets:16811 errors:0 dropped:0
               overruns:0 frame:0
                                  TX packets:15848 errors:0 dropped:1
        overruns:0
                          carrier:0
                                  collisions:0 txqueuelen:256
                                  RX bytes:102229428 (97.4 Mb)  TX
               bytes:102324172
                          (97.5 Mb)


                          I have tried the following:

                          /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
        -machinefile
                          machinefile -mca btl openib,self -mca
               btl_openib_max_btls
                          1 -mca btl_openib_ib_pkey_val 0x8109 -mca
                          btl_openib_ib_pkey_ix 1
               /cluster/pallas/x86_64-ib/IMB-MPI1

                          but I just get a RETRY EXCEEDED ERROR. Is
        there a MCA
                          parameter I am missing?

                          I was successful using tcp only:

                          /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
        -machinefile
                          machinefile -mca btl tcp,self -mca
               btl_openib_max_btls 1
                          -mca btl_openib_ib_pkey_val 0x8109
                          /cluster/pallas/x86_64-ib/IMB-MPI1



                          Thanks,
                          Matt Burgess

                          _______________________________________________
                          users mailing list
                          us...@open-mpi.org
        <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
        <mailto:us...@open-mpi.org>>
               <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>


http://www.open-mpi.org/mailman/listinfo.cgi/users

                          _______________________________________________
                          users mailing list
                          us...@open-mpi.org
        <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
        <mailto:us...@open-mpi.org>>
               <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>


http://www.open-mpi.org/mailman/listinfo.cgi/users


                          --            Jeff Squyres
                          Cisco Systems


                          _______________________________________________
                          users mailing list
                          us...@open-mpi.org
        <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
        <mailto:us...@open-mpi.org>>
               <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>


http://www.open-mpi.org/mailman/listinfo.cgi/users



                      --        Jeff Squyres
                      Cisco Systems




------------------------------------------------------------------------

               _______________________________________________
               devel mailing list
               de...@open-mpi.org <mailto:de...@open-mpi.org>
        <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>

               http://www.open-mpi.org/mailman/listinfo.cgi/devel



           --    --
           Pavel Shamis (Pasha)
           Mellanox Technologies LTD.


           Index: ompi/mca/btl/openib/btl_openib_component.c
===================================================================
           --- ompi/mca/btl/openib/btl_openib_component.c  (revision
        19490)
           +++ ompi/mca/btl/openib/btl_openib_component.c  (working copy)
           @@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
                    goto dealloc_pd;
               }

           -    ret = OMPI_SUCCESS;
           +    ret = OMPI_SUCCESS;
               /* Note ports are 1 based hence j = 1 */
               for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
                   struct ibv_port_attr ib_port_attr;
           @@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
                           uint16_t pkey,j;
                           for (j=0; j < hca->ib_dev_attr.max_pkeys;
        j++) {
                               ibv_query_pkey(hca->ib_dev_context, i,
        j, &pkey);
           -                    pkey=ntohs(pkey);
           +                    pkey=ntohs(pkey) & 0x7fff;
                               if(pkey ==
        mca_btl_openib_component.ib_pkey_val){
                                   ret = init_one_port(btl_list, hca,
        i, j,
           &ib_port_attr);
                                   break;
           Index: ompi/mca/btl/openib/btl_openib_ini.c
===================================================================
           --- ompi/mca/btl/openib/btl_openib_ini.c        (revision
        19490)
           +++ ompi/mca/btl/openib/btl_openib_ini.c        (working copy)
           @@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
            static void reset_section(bool had_previous_value,
           parsed_section_values_t *s);
            static void reset_values(ompi_btl_openib_ini_values_t *v);
            static int save_section(parsed_section_values_t *s);
           -static int intify(char *string);
           -static int intify_list(char *str, uint32_t **values, int
        *len);
            static inline void show_help(const char *topic);


           @@ -364,14 +362,14 @@ static int parse_line(parsed_section_val
                  all whitespace at the beginning and ending of the
        value. */

               if (0 == strcasecmp(key_buffer, "vendor_id")) {
           -        if (OMPI_SUCCESS != (ret = intify_list(value,
           &sv->vendor_ids,
           +        if (OMPI_SUCCESS != (ret =
           ompi_btl_openib_ini_intify_list(value, &sv->vendor_ids,
&sv->vendor_ids_len))) {
                       return ret;
                   }
               }

               else if (0 == strcasecmp(key_buffer, "vendor_part_id")) {
           -        if (OMPI_SUCCESS != (ret = intify_list(value,
           &sv->vendor_part_ids,
           +        if (OMPI_SUCCESS != (ret =
           ompi_btl_openib_ini_intify_list(value, &sv->vendor_part_ids,
&sv->vendor_part_ids_len))) {
                       return ret;
                   }
           @@ -379,13 +377,13 @@ static int parse_line(parsed_section_val

               else if (0 == strcasecmp(key_buffer, "mtu")) {
                   /* Single value */
           -        sv->values.mtu = (uint32_t) intify(value);
           +        sv->values.mtu = (uint32_t)
           ompi_btl_openib_ini_intify(value);
                   sv->values.mtu_set = true;
               }

               else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) {
                   /* Single value */
           -        sv->values.use_eager_rdma = (uint32_t) intify(value);
           +        sv->values.use_eager_rdma = (uint32_t)
           ompi_btl_openib_ini_intify(value);
                   sv->values.use_eager_rdma_set = true;
               }

           @@ -547,7 +545,7 @@ static int save_section(parsed_section_v
            /*
            * Do string-to-integer conversion, for both hex and
        decimal numbers
            */
           -static int intify(char *str)
           +int ompi_btl_openib_ini_intify(char *str)
            {
               while (isspace(*str)) {
                   ++str;
           @@ -568,7 +566,7 @@ static int intify(char *str)
            /*
            * Take a comma-delimited list and infity them all
            */
           -static int intify_list(char *value, uint32_t **values, int
        *len)
           +int ompi_btl_openib_ini_intify_list(char *value, uint32_t
           **values, int *len)
            {
               char *comma;
               char *str = value;
           @@ -584,7 +582,7 @@ static int intify_list(char *value, uint
                   if (NULL == *values) {
                       return OMPI_ERR_OUT_OF_RESOURCE;
                   }
           -        *values[0] = (uint32_t) intify(str);
           +        *values[0] = (uint32_t)
        ompi_btl_openib_ini_intify(str);
                   *len = 1;
               } else {
                   /* If we found a comma, loop over all the values.  Be a
           @@ -594,7 +592,7 @@ static int intify_list(char *value, uint
                   do {
                       *comma = '\0';
                       *values = realloc(*values, sizeof(uint32_t) *
        (*len + 2));
           -            (*values)[*len] = (int32_t) intify(str);
           +            (*values)[*len] = (int32_t)
           ompi_btl_openib_ini_intify(str);
                       ++(*len);
                       str = comma + 1;
                       comma = strchr(str, ',');
           @@ -602,7 +600,7 @@ static int intify_list(char *value, uint
                   /* Get the last value (i.e., the value after the last
                      comma, because it won't have been snarfed in the
                      loop) */
           -        (*values)[*len] = (uint32_t) intify(str);
           +        (*values)[*len] = (uint32_t)
        ompi_btl_openib_ini_intify(str);
                   ++(*len);
               }

           Index: ompi/mca/btl/openib/btl_openib_ini.h
===================================================================
           --- ompi/mca/btl/openib/btl_openib_ini.h        (revision
        19490)
           +++ ompi/mca/btl/openib/btl_openib_ini.h        (working copy)
           @@ -49,6 +49,9 @@ extern "C" {
                */
               int ompi_btl_openib_ini_finalize(void);

           +    int ompi_btl_openib_ini_intify(char *string);
           +    int ompi_btl_openib_ini_intify_list(char *str, uint32_t
           **values, int *len);
           +
            #if defined(c_plusplus) || defined(__cplusplus)
            }
            #endif
           Index: ompi/mca/btl/openib/btl_openib_mca.c
===================================================================
           --- ompi/mca/btl/openib/btl_openib_mca.c        (revision
        19490)
           +++ ompi/mca/btl/openib/btl_openib_mca.c        (working copy)
           @@ -27,6 +27,7 @@
            #include "opal/mca/base/mca_base_param.h"
            #include "btl_openib.h"
            #include "btl_openib_mca.h"
           +#include "btl_openib_ini.h"

            /*
            * Local flags
           @@ -97,7 +98,7 @@ static inline int reg_int(const char* pa
            */
            int btl_openib_register_mca_params(void)
            {
           -    char *msg, *str;
           +    char *msg, *str, *pkey;
               int ival, ival2, ret, tmp;

               ret = OMPI_SUCCESS;
           @@ -192,13 +193,15 @@ int btl_openib_register_mca_params(void)
                             0, &ival, REGINT_GE_ZERO));
               mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival;

           -    CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value"
           +    CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value"
                             "(must be > 0 and < 0xffff)",
           -                  0, &ival, REGINT_GE_ZERO));
           -    if (ival > 0xffff) {
           +                  "0", &pkey, 0));
           +    mca_btl_openib_component.ib_pkey_val =
           ompi_btl_openib_ini_intify(pkey) & 0x7fff;
           +    if (mca_btl_openib_component.ib_pkey_val > 0xffff ||
           +            mca_btl_openib_component.ib_pkey_val < 0) {
                   ret = OMPI_ERR_BAD_PARAM;
               }
           -    mca_btl_openib_component.ib_pkey_val = (uint32_t) ival;
           +    free(pkey);

               CHECK(reg_int("ib_psn", "InfiniBand packet sequence
        starting
           number "
                             "(must be >= 0)",

           _______________________________________________
           devel mailing list
           de...@open-mpi.org <mailto:de...@open-mpi.org>
        <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>

           http://www.open-mpi.org/mailman/listinfo.cgi/devel




-- --
    Pavel Shamis (Pasha)
    Mellanox Technologies LTD.




--
--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.

Reply via email to