Pasha,

That's great, thanks for the help. When exactly do you expect that 1.2.8
will be released?

Thanks,
Matt

On Tue, Oct 7, 2008 at 1:29 PM, Pavel Shamis (Pasha) <
pa...@dev.mellanox.co.il> wrote:

> Matt,
> For all 1.2.X versions you should use btl_openib_ib_pkey_val
> In ongoing 1.3 version the parameter was renamed to btl_openib_of_pkey_val.
>
> BTW we plan to release 1.2.8 version very soon and it will include the
> partition bug fix.
>
> Regards,
> Pasha
>
> Matt Burgess wrote:
>
>> Pasha,
>>
>> With your patch and parameter suggestion, it works! So to be clear
>> btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for 1.2.7?
>>
>> Thanks again,
>> Matt
>>
>> On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) <
>> pa...@dev.mellanox.co.il <mailto:pa...@dev.mellanox.co.il>> wrote:
>>
>>    Matt,
>>    Can you please run " cat
>>    /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib.
>>    I would like to check the partition configuration.
>>
>>    Ohh, BTW I see that the command line in previous email was wrong,
>>    Please use follow command line (the parameter name should be
>>    "btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts
>>    HEX/DEC values):
>>    /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
>>    openib,self -mca btl_openib_ib_pkey_val 0x8109
>>    /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>>    Ompi 1.2.6 version should work ok with this patch.
>>
>>
>>    Thanks,
>>    Pasha
>>
>>    Matt Burgess wrote:
>>
>>        Pasha,
>>
>>        Thanks for the patch. Unfortunately, it doesn't seem like that
>>        fixed the problem. I realized earlier I didn't mention what
>>        version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.>
>>        <http://1.2.6.> Should I be trying 1.2.7 with this patch?
>>
>>        Thanks,
>>        Matt
>>
>>        2008/10/7 Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il
>>        <mailto:pa...@dev.mellanox.co.il>
>>        <mailto:pa...@dev.mellanox.co.il
>>        <mailto:pa...@dev.mellanox.co.il>>>
>>
>>
>>           Matt,
>>           Can you please try attached patch ? I guess it will resolve
>>        this
>>           issue.
>>
>>           Thanks,
>>           Pasha
>>
>>           Matt Burgess wrote:
>>
>>               Lenny,
>>
>>               Thanks for the info. It doesn't seem to be be working
>>        still.
>>               My command line is:
>>
>>               /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib
>>        -mca btl
>>               openib,self -mca btl_openib_of_pkey_val 33033
>>               /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>>               I don't have a
>>        "/sys/class/infiniband/mthca0/ports/1/pkeys/"
>>               but I do have
>>        "/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
>>               It's contents are:
>>
>>               0    106  114  122  16   24   32   40   49   57   65
>>  73   81
>>                 9    98
>>               1    107  115  123  17   25   33   41   5    58   66
>>  74   82
>>                 90   99
>>               10   108  116  124  18   26   34   42   50   59   67
>>  75   83
>>                 91  100  109  117  125  19   27   35   43   51   6
>>   68          76   84   92  101  11   118  126  2    28   36          44
>> 52   60
>>                 69   77   85   93  102  110  119  127  20   29   37
>>    45          53   61   7    78   86   94  103  111  12   13          21
>> 3    38
>>                 46   54   62   70   79   87   95  104  112  120  14
>>    22          30   39   47   55   63   71   8    88   96  105
>>         113  121  15
>>                 23   31   4    48   56   64   72   80   89   97
>>               We aren't using the opensm, but voltaire's SM on a 2012
>>        switch.
>>
>>               Thanks again,
>>               Matt
>>
>>
>>               On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
>>               <lenny.verkhov...@gmail.com
>>        <mailto:lenny.verkhov...@gmail.com>
>>               <mailto:lenny.verkhov...@gmail.com
>>        <mailto:lenny.verkhov...@gmail.com>>
>>               <mailto:lenny.verkhov...@gmail.com
>>        <mailto:lenny.verkhov...@gmail.com>
>>               <mailto:lenny.verkhov...@gmail.com
>>        <mailto:lenny.verkhov...@gmail.com>>>> wrote:
>>
>>                  Hi Matt,
>>
>>                  It seems that the right way to do it is the fallowing:
>>
>>                  -mca btl openib,self -mca btl_openib_ib_pkey_val 33033
>>
>>                  when the value is a decimal number of the pkey, in
>>        your case
>>                  0x8109 = 33033, and no need for
>>        btl_openib_ib_pkey_ix value.
>>
>>                  ex.
>>                  mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
>>                  btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
>>                  LT (2) (size min max avg) 1 3.511429 3.511429 3.511429
>>
>>                  if it's not working check cat
>>                  /sys/class/infiniband/mthca0/ports/1/pkeys/* for
>>        pkeys ans SM,
>>                  maybe it's a setup.
>>
>>                  Pasha is currently checking this issue.
>>
>>                  Best regards,
>>
>>                  Lenny.
>>
>>
>>
>>
>>
>>                  On 10/7/08, *Jeff Squyres* <jsquy...@cisco.com
>>        <mailto:jsquy...@cisco.com>
>>               <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
>>                  <mailto:jsquy...@cisco.com
>>        <mailto:jsquy...@cisco.com> <mailto:jsquy...@cisco.com
>>        <mailto:jsquy...@cisco.com>>>> wrote:
>>
>>                      FWIW, if this configuration is for all of your
>>        users, you
>>                      might want to specify these MCA params in the
>>        default MCA
>>                      param file, or the environment, ...etc.  Just so
>>        that you
>>                      don't have to specify it on every mpirun command
>>        line.
>>
>>                      See
>>
>> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params.
>>
>>
>>
>>                      On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote:
>>
>>                          Sorry, misunderstood the question,
>>
>>                          thanks for Pasha the right command line will be
>>
>>                          -mca btl openib,self -mca
>>        btl_openib_of_pkey_val 0x8109
>>                          -mca btl_openib_of_pkey_ix 1
>>
>>                          ex.
>>
>>                          #mpirun -np 2 -H witch2,witch3 -mca btl
>>        openib,self
>>               -mca
>>                          btl_openib_of_pkey_val 0x8001 -mca
>>               btl_openib_of_pkey_ix 1
>>                          ./mpi_p1_4_TRUNK -t lt
>>                          LT (2) (size min max avg) 1 3.443480
>>        3.443480 3.443480
>>
>>
>>                          Best regards
>>
>>                          Lenny.
>>
>>
>>                          On 10/6/08, Jeff Squyres <jsquy...@cisco.com
>>        <mailto:jsquy...@cisco.com>
>>               <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
>>                          <mailto:jsquy...@cisco.com
>>        <mailto:jsquy...@cisco.com>
>>
>>               <mailto:jsquy...@cisco.com
>>        <mailto:jsquy...@cisco.com>>>> wrote: On Oct 5, 2008, at
>>
>>                          1:22 PM, Lenny Verkhovsky wrote:
>>
>>                          you should probably use -mca tcp,self  -mca
>>                          btl_openib_if_include ib0.8109
>>
>>
>>                          Really?  I thought we only took OpenFabrics
>>        device
>>               names
>>                          in the openib_if_include MCA param...?  It
>>        looks like
>>                          ib0.8109 is an IPoIB device name.
>>
>>
>>
>>                          Lenny.
>>
>>
>>
>>                          On 10/3/08, Matt Burgess
>>        <burgess.m...@gmail.com <mailto:burgess.m...@gmail.com>
>>               <mailto:burgess.m...@gmail.com
>>        <mailto:burgess.m...@gmail.com>>
>>                          <mailto:burgess.m...@gmail.com
>>        <mailto:burgess.m...@gmail.com>
>>               <mailto:burgess.m...@gmail.com
>>        <mailto:burgess.m...@gmail.com>>>> wrote:
>>                          Hi,
>>
>>
>>                          I'm trying to get openmpi working over openib
>>               partitions.
>>                          On this cluster, the partition number is
>>        0x109. The ib
>>                          interfaces are pingable over the appropriate
>>        ib0.8109
>>                          interface:
>>
>>                          d2:/opt/openmpi-ib # ifconfig ib0.8109
>>                          ib0.8109  Link encap:UNSPEC  HWaddr
>>                          80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00
>>                                  inet addr:10.21.48.2
>>        <http://10.21.48.2> <http://10.21.48.2>
>>               <http://10.21.48.2>
>>                           Bcast:10.21.255.255 <http://10.21.255.255>
>>        <http://10.21.255.255>
>>               <http://10.21.255.255>
>>                           Mask:255.255.0.0 <http://255.255.0.0>
>>        <http://255.255.0.0>
>>               <http://255.255.0.0>
>>
>>                                  inet6 addr: fe80::202:c902:26:ca01/64
>>               Scope:Link
>>                                  UP BROADCAST RUNNING MULTICAST
>>         MTU:65520
>>                Metric:1
>>                                  RX packets:16811 errors:0 dropped:0
>>               overruns:0 frame:0
>>                                  TX packets:15848 errors:0 dropped:1
>>        overruns:0
>>                          carrier:0
>>                                  collisions:0 txqueuelen:256
>>                                  RX bytes:102229428 (97.4 Mb)  TX
>>               bytes:102324172
>>                          (97.5 Mb)
>>
>>
>>                          I have tried the following:
>>
>>                          /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
>>        -machinefile
>>                          machinefile -mca btl openib,self -mca
>>               btl_openib_max_btls
>>                          1 -mca btl_openib_ib_pkey_val 0x8109 -mca
>>                          btl_openib_ib_pkey_ix 1
>>               /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>>                          but I just get a RETRY EXCEEDED ERROR. Is
>>        there a MCA
>>                          parameter I am missing?
>>
>>                          I was successful using tcp only:
>>
>>                          /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
>>        -machinefile
>>                          machinefile -mca btl tcp,self -mca
>>               btl_openib_max_btls 1
>>                          -mca btl_openib_ib_pkey_val 0x8109
>>                          /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>>
>>
>>                          Thanks,
>>                          Matt Burgess
>>
>>                          _______________________________________________
>>                          users mailing list
>>                          us...@open-mpi.org
>>        <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
>>        <mailto:us...@open-mpi.org>>
>>               <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
>>        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>
>>
>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>                          _______________________________________________
>>                          users mailing list
>>                          us...@open-mpi.org
>>        <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
>>        <mailto:us...@open-mpi.org>>
>>               <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
>>        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>
>>
>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>                          --            Jeff Squyres
>>                          Cisco Systems
>>
>>
>>                          _______________________________________________
>>                          users mailing list
>>                          us...@open-mpi.org
>>        <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
>>        <mailto:us...@open-mpi.org>>
>>               <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
>>        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>
>>
>>
>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>                      --        Jeff Squyres
>>                      Cisco Systems
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>               _______________________________________________
>>               devel mailing list
>>               de...@open-mpi.org <mailto:de...@open-mpi.org>
>>        <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
>>
>>
>>               http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>>           --    --
>>           Pavel Shamis (Pasha)
>>           Mellanox Technologies LTD.
>>
>>
>>           Index: ompi/mca/btl/openib/btl_openib_component.c
>>
>> ===================================================================
>>           --- ompi/mca/btl/openib/btl_openib_component.c  (revision
>>        19490)
>>           +++ ompi/mca/btl/openib/btl_openib_component.c  (working copy)
>>           @@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
>>                    goto dealloc_pd;
>>               }
>>
>>           -    ret = OMPI_SUCCESS;
>>           +    ret = OMPI_SUCCESS;
>>               /* Note ports are 1 based hence j = 1 */
>>               for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
>>                   struct ibv_port_attr ib_port_attr;
>>           @@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
>>                           uint16_t pkey,j;
>>                           for (j=0; j < hca->ib_dev_attr.max_pkeys;
>>        j++) {
>>                               ibv_query_pkey(hca->ib_dev_context, i,
>>        j, &pkey);
>>           -                    pkey=ntohs(pkey);
>>           +                    pkey=ntohs(pkey) & 0x7fff;
>>                               if(pkey ==
>>        mca_btl_openib_component.ib_pkey_val){
>>                                   ret = init_one_port(btl_list, hca,
>>        i, j,
>>           &ib_port_attr);
>>                                   break;
>>           Index: ompi/mca/btl/openib/btl_openib_ini.c
>>
>> ===================================================================
>>           --- ompi/mca/btl/openib/btl_openib_ini.c        (revision
>>        19490)
>>           +++ ompi/mca/btl/openib/btl_openib_ini.c        (working copy)
>>           @@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
>>            static void reset_section(bool had_previous_value,
>>           parsed_section_values_t *s);
>>            static void reset_values(ompi_btl_openib_ini_values_t *v);
>>            static int save_section(parsed_section_values_t *s);
>>           -static int intify(char *string);
>>           -static int intify_list(char *str, uint32_t **values, int
>>        *len);
>>            static inline void show_help(const char *topic);
>>
>>
>>           @@ -364,14 +362,14 @@ static int parse_line(parsed_section_val
>>                  all whitespace at the beginning and ending of the
>>        value. */
>>
>>               if (0 == strcasecmp(key_buffer, "vendor_id")) {
>>           -        if (OMPI_SUCCESS != (ret = intify_list(value,
>>           &sv->vendor_ids,
>>           +        if (OMPI_SUCCESS != (ret =
>>           ompi_btl_openib_ini_intify_list(value, &sv->vendor_ids,
>>
>>  &sv->vendor_ids_len))) {
>>                       return ret;
>>                   }
>>               }
>>
>>               else if (0 == strcasecmp(key_buffer, "vendor_part_id")) {
>>           -        if (OMPI_SUCCESS != (ret = intify_list(value,
>>           &sv->vendor_part_ids,
>>           +        if (OMPI_SUCCESS != (ret =
>>           ompi_btl_openib_ini_intify_list(value, &sv->vendor_part_ids,
>>
>>  &sv->vendor_part_ids_len))) {
>>                       return ret;
>>                   }
>>           @@ -379,13 +377,13 @@ static int parse_line(parsed_section_val
>>
>>               else if (0 == strcasecmp(key_buffer, "mtu")) {
>>                   /* Single value */
>>           -        sv->values.mtu = (uint32_t) intify(value);
>>           +        sv->values.mtu = (uint32_t)
>>           ompi_btl_openib_ini_intify(value);
>>                   sv->values.mtu_set = true;
>>               }
>>
>>               else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) {
>>                   /* Single value */
>>           -        sv->values.use_eager_rdma = (uint32_t) intify(value);
>>           +        sv->values.use_eager_rdma = (uint32_t)
>>           ompi_btl_openib_ini_intify(value);
>>                   sv->values.use_eager_rdma_set = true;
>>               }
>>
>>           @@ -547,7 +545,7 @@ static int save_section(parsed_section_v
>>            /*
>>            * Do string-to-integer conversion, for both hex and
>>        decimal numbers
>>            */
>>           -static int intify(char *str)
>>           +int ompi_btl_openib_ini_intify(char *str)
>>            {
>>               while (isspace(*str)) {
>>                   ++str;
>>           @@ -568,7 +566,7 @@ static int intify(char *str)
>>            /*
>>            * Take a comma-delimited list and infity them all
>>            */
>>           -static int intify_list(char *value, uint32_t **values, int
>>        *len)
>>           +int ompi_btl_openib_ini_intify_list(char *value, uint32_t
>>           **values, int *len)
>>            {
>>               char *comma;
>>               char *str = value;
>>           @@ -584,7 +582,7 @@ static int intify_list(char *value, uint
>>                   if (NULL == *values) {
>>                       return OMPI_ERR_OUT_OF_RESOURCE;
>>                   }
>>           -        *values[0] = (uint32_t) intify(str);
>>           +        *values[0] = (uint32_t)
>>        ompi_btl_openib_ini_intify(str);
>>                   *len = 1;
>>               } else {
>>                   /* If we found a comma, loop over all the values.  Be a
>>           @@ -594,7 +592,7 @@ static int intify_list(char *value, uint
>>                   do {
>>                       *comma = '\0';
>>                       *values = realloc(*values, sizeof(uint32_t) *
>>        (*len + 2));
>>           -            (*values)[*len] = (int32_t) intify(str);
>>           +            (*values)[*len] = (int32_t)
>>           ompi_btl_openib_ini_intify(str);
>>                       ++(*len);
>>                       str = comma + 1;
>>                       comma = strchr(str, ',');
>>           @@ -602,7 +600,7 @@ static int intify_list(char *value, uint
>>                   /* Get the last value (i.e., the value after the last
>>                      comma, because it won't have been snarfed in the
>>                      loop) */
>>           -        (*values)[*len] = (uint32_t) intify(str);
>>           +        (*values)[*len] = (uint32_t)
>>        ompi_btl_openib_ini_intify(str);
>>                   ++(*len);
>>               }
>>
>>           Index: ompi/mca/btl/openib/btl_openib_ini.h
>>
>> ===================================================================
>>           --- ompi/mca/btl/openib/btl_openib_ini.h        (revision
>>        19490)
>>           +++ ompi/mca/btl/openib/btl_openib_ini.h        (working copy)
>>           @@ -49,6 +49,9 @@ extern "C" {
>>                */
>>               int ompi_btl_openib_ini_finalize(void);
>>
>>           +    int ompi_btl_openib_ini_intify(char *string);
>>           +    int ompi_btl_openib_ini_intify_list(char *str, uint32_t
>>           **values, int *len);
>>           +
>>            #if defined(c_plusplus) || defined(__cplusplus)
>>            }
>>            #endif
>>           Index: ompi/mca/btl/openib/btl_openib_mca.c
>>
>> ===================================================================
>>           --- ompi/mca/btl/openib/btl_openib_mca.c        (revision
>>        19490)
>>           +++ ompi/mca/btl/openib/btl_openib_mca.c        (working copy)
>>           @@ -27,6 +27,7 @@
>>            #include "opal/mca/base/mca_base_param.h"
>>            #include "btl_openib.h"
>>            #include "btl_openib_mca.h"
>>           +#include "btl_openib_ini.h"
>>
>>            /*
>>            * Local flags
>>           @@ -97,7 +98,7 @@ static inline int reg_int(const char* pa
>>            */
>>            int btl_openib_register_mca_params(void)
>>            {
>>           -    char *msg, *str;
>>           +    char *msg, *str, *pkey;
>>               int ival, ival2, ret, tmp;
>>
>>               ret = OMPI_SUCCESS;
>>           @@ -192,13 +193,15 @@ int btl_openib_register_mca_params(void)
>>                             0, &ival, REGINT_GE_ZERO));
>>               mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival;
>>
>>           -    CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value"
>>           +    CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value"
>>                             "(must be > 0 and < 0xffff)",
>>           -                  0, &ival, REGINT_GE_ZERO));
>>           -    if (ival > 0xffff) {
>>           +                  "0", &pkey, 0));
>>           +    mca_btl_openib_component.ib_pkey_val =
>>           ompi_btl_openib_ini_intify(pkey) & 0x7fff;
>>           +    if (mca_btl_openib_component.ib_pkey_val > 0xffff ||
>>           +            mca_btl_openib_component.ib_pkey_val < 0) {
>>                   ret = OMPI_ERR_BAD_PARAM;
>>               }
>>           -    mca_btl_openib_component.ib_pkey_val = (uint32_t) ival;
>>           +    free(pkey);
>>
>>               CHECK(reg_int("ib_psn", "InfiniBand packet sequence
>>        starting
>>           number "
>>                             "(must be >= 0)",
>>
>>           _______________________________________________
>>           devel mailing list
>>           de...@open-mpi.org <mailto:de...@open-mpi.org>
>>        <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
>>
>>           http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>>
>>    --    --
>>    Pavel Shamis (Pasha)
>>    Mellanox Technologies LTD.
>>
>>
>>
>
> --
> --
> Pavel Shamis (Pasha)
> Mellanox Technologies LTD.
>
>

Reply via email to