Pasha, That's great, thanks for the help. When exactly do you expect that 1.2.8 will be released?
Thanks, Matt On Tue, Oct 7, 2008 at 1:29 PM, Pavel Shamis (Pasha) < pa...@dev.mellanox.co.il> wrote: > Matt, > For all 1.2.X versions you should use btl_openib_ib_pkey_val > In ongoing 1.3 version the parameter was renamed to btl_openib_of_pkey_val. > > BTW we plan to release 1.2.8 version very soon and it will include the > partition bug fix. > > Regards, > Pasha > > Matt Burgess wrote: > >> Pasha, >> >> With your patch and parameter suggestion, it works! So to be clear >> btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for 1.2.7? >> >> Thanks again, >> Matt >> >> On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) < >> pa...@dev.mellanox.co.il <mailto:pa...@dev.mellanox.co.il>> wrote: >> >> Matt, >> Can you please run " cat >> /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib. >> I would like to check the partition configuration. >> >> Ohh, BTW I see that the command line in previous email was wrong, >> Please use follow command line (the parameter name should be >> "btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts >> HEX/DEC values): >> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl >> openib,self -mca btl_openib_ib_pkey_val 0x8109 >> /cluster/pallas/x86_64-ib/IMB-MPI1 >> >> Ompi 1.2.6 version should work ok with this patch. >> >> >> Thanks, >> Pasha >> >> Matt Burgess wrote: >> >> Pasha, >> >> Thanks for the patch. Unfortunately, it doesn't seem like that >> fixed the problem. I realized earlier I didn't mention what >> version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.> >> <http://1.2.6.> Should I be trying 1.2.7 with this patch? >> >> Thanks, >> Matt >> >> 2008/10/7 Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il >> <mailto:pa...@dev.mellanox.co.il> >> <mailto:pa...@dev.mellanox.co.il >> <mailto:pa...@dev.mellanox.co.il>>> >> >> >> Matt, >> Can you please try attached patch ? I guess it will resolve >> this >> issue. >> >> Thanks, >> Pasha >> >> Matt Burgess wrote: >> >> Lenny, >> >> Thanks for the info. It doesn't seem to be be working >> still. >> My command line is: >> >> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib >> -mca btl >> openib,self -mca btl_openib_of_pkey_val 33033 >> /cluster/pallas/x86_64-ib/IMB-MPI1 >> >> I don't have a >> "/sys/class/infiniband/mthca0/ports/1/pkeys/" >> but I do have >> "/sys/class/infiniband/mlx4_0/ports/1/pkeys/". >> It's contents are: >> >> 0 106 114 122 16 24 32 40 49 57 65 >> 73 81 >> 9 98 >> 1 107 115 123 17 25 33 41 5 58 66 >> 74 82 >> 90 99 >> 10 108 116 124 18 26 34 42 50 59 67 >> 75 83 >> 91 100 109 117 125 19 27 35 43 51 6 >> 68 76 84 92 101 11 118 126 2 28 36 44 >> 52 60 >> 69 77 85 93 102 110 119 127 20 29 37 >> 45 53 61 7 78 86 94 103 111 12 13 21 >> 3 38 >> 46 54 62 70 79 87 95 104 112 120 14 >> 22 30 39 47 55 63 71 8 88 96 105 >> 113 121 15 >> 23 31 4 48 56 64 72 80 89 97 >> We aren't using the opensm, but voltaire's SM on a 2012 >> switch. >> >> Thanks again, >> Matt >> >> >> On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky >> <lenny.verkhov...@gmail.com >> <mailto:lenny.verkhov...@gmail.com> >> <mailto:lenny.verkhov...@gmail.com >> <mailto:lenny.verkhov...@gmail.com>> >> <mailto:lenny.verkhov...@gmail.com >> <mailto:lenny.verkhov...@gmail.com> >> <mailto:lenny.verkhov...@gmail.com >> <mailto:lenny.verkhov...@gmail.com>>>> wrote: >> >> Hi Matt, >> >> It seems that the right way to do it is the fallowing: >> >> -mca btl openib,self -mca btl_openib_ib_pkey_val 33033 >> >> when the value is a decimal number of the pkey, in >> your case >> 0x8109 = 33033, and no need for >> btl_openib_ib_pkey_ix value. >> >> ex. >> mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca >> btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt >> LT (2) (size min max avg) 1 3.511429 3.511429 3.511429 >> >> if it's not working check cat >> /sys/class/infiniband/mthca0/ports/1/pkeys/* for >> pkeys ans SM, >> maybe it's a setup. >> >> Pasha is currently checking this issue. >> >> Best regards, >> >> Lenny. >> >> >> >> >> >> On 10/7/08, *Jeff Squyres* <jsquy...@cisco.com >> <mailto:jsquy...@cisco.com> >> <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>> >> <mailto:jsquy...@cisco.com >> <mailto:jsquy...@cisco.com> <mailto:jsquy...@cisco.com >> <mailto:jsquy...@cisco.com>>>> wrote: >> >> FWIW, if this configuration is for all of your >> users, you >> might want to specify these MCA params in the >> default MCA >> param file, or the environment, ...etc. Just so >> that you >> don't have to specify it on every mpirun command >> line. >> >> See >> >> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params. >> >> >> >> On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote: >> >> Sorry, misunderstood the question, >> >> thanks for Pasha the right command line will be >> >> -mca btl openib,self -mca >> btl_openib_of_pkey_val 0x8109 >> -mca btl_openib_of_pkey_ix 1 >> >> ex. >> >> #mpirun -np 2 -H witch2,witch3 -mca btl >> openib,self >> -mca >> btl_openib_of_pkey_val 0x8001 -mca >> btl_openib_of_pkey_ix 1 >> ./mpi_p1_4_TRUNK -t lt >> LT (2) (size min max avg) 1 3.443480 >> 3.443480 3.443480 >> >> >> Best regards >> >> Lenny. >> >> >> On 10/6/08, Jeff Squyres <jsquy...@cisco.com >> <mailto:jsquy...@cisco.com> >> <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>> >> <mailto:jsquy...@cisco.com >> <mailto:jsquy...@cisco.com> >> >> <mailto:jsquy...@cisco.com >> <mailto:jsquy...@cisco.com>>>> wrote: On Oct 5, 2008, at >> >> 1:22 PM, Lenny Verkhovsky wrote: >> >> you should probably use -mca tcp,self -mca >> btl_openib_if_include ib0.8109 >> >> >> Really? I thought we only took OpenFabrics >> device >> names >> in the openib_if_include MCA param...? It >> looks like >> ib0.8109 is an IPoIB device name. >> >> >> >> Lenny. >> >> >> >> On 10/3/08, Matt Burgess >> <burgess.m...@gmail.com <mailto:burgess.m...@gmail.com> >> <mailto:burgess.m...@gmail.com >> <mailto:burgess.m...@gmail.com>> >> <mailto:burgess.m...@gmail.com >> <mailto:burgess.m...@gmail.com> >> <mailto:burgess.m...@gmail.com >> <mailto:burgess.m...@gmail.com>>>> wrote: >> Hi, >> >> >> I'm trying to get openmpi working over openib >> partitions. >> On this cluster, the partition number is >> 0x109. The ib >> interfaces are pingable over the appropriate >> ib0.8109 >> interface: >> >> d2:/opt/openmpi-ib # ifconfig ib0.8109 >> ib0.8109 Link encap:UNSPEC HWaddr >> 80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00 >> inet addr:10.21.48.2 >> <http://10.21.48.2> <http://10.21.48.2> >> <http://10.21.48.2> >> Bcast:10.21.255.255 <http://10.21.255.255> >> <http://10.21.255.255> >> <http://10.21.255.255> >> Mask:255.255.0.0 <http://255.255.0.0> >> <http://255.255.0.0> >> <http://255.255.0.0> >> >> inet6 addr: fe80::202:c902:26:ca01/64 >> Scope:Link >> UP BROADCAST RUNNING MULTICAST >> MTU:65520 >> Metric:1 >> RX packets:16811 errors:0 dropped:0 >> overruns:0 frame:0 >> TX packets:15848 errors:0 dropped:1 >> overruns:0 >> carrier:0 >> collisions:0 txqueuelen:256 >> RX bytes:102229428 (97.4 Mb) TX >> bytes:102324172 >> (97.5 Mb) >> >> >> I have tried the following: >> >> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 >> -machinefile >> machinefile -mca btl openib,self -mca >> btl_openib_max_btls >> 1 -mca btl_openib_ib_pkey_val 0x8109 -mca >> btl_openib_ib_pkey_ix 1 >> /cluster/pallas/x86_64-ib/IMB-MPI1 >> >> but I just get a RETRY EXCEEDED ERROR. Is >> there a MCA >> parameter I am missing? >> >> I was successful using tcp only: >> >> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 >> -machinefile >> machinefile -mca btl tcp,self -mca >> btl_openib_max_btls 1 >> -mca btl_openib_ib_pkey_val 0x8109 >> /cluster/pallas/x86_64-ib/IMB-MPI1 >> >> >> >> Thanks, >> Matt Burgess >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org >> <mailto:us...@open-mpi.org>> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>> >> >> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org >> <mailto:us...@open-mpi.org>> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>> >> >> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- Jeff Squyres >> Cisco Systems >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org >> <mailto:us...@open-mpi.org>> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>> >> >> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> -- Jeff Squyres >> Cisco Systems >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <mailto:de...@open-mpi.org> >> <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>> >> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> -- -- >> Pavel Shamis (Pasha) >> Mellanox Technologies LTD. >> >> >> Index: ompi/mca/btl/openib/btl_openib_component.c >> >> =================================================================== >> --- ompi/mca/btl/openib/btl_openib_component.c (revision >> 19490) >> +++ ompi/mca/btl/openib/btl_openib_component.c (working copy) >> @@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl >> goto dealloc_pd; >> } >> >> - ret = OMPI_SUCCESS; >> + ret = OMPI_SUCCESS; >> /* Note ports are 1 based hence j = 1 */ >> for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){ >> struct ibv_port_attr ib_port_attr; >> @@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl >> uint16_t pkey,j; >> for (j=0; j < hca->ib_dev_attr.max_pkeys; >> j++) { >> ibv_query_pkey(hca->ib_dev_context, i, >> j, &pkey); >> - pkey=ntohs(pkey); >> + pkey=ntohs(pkey) & 0x7fff; >> if(pkey == >> mca_btl_openib_component.ib_pkey_val){ >> ret = init_one_port(btl_list, hca, >> i, j, >> &ib_port_attr); >> break; >> Index: ompi/mca/btl/openib/btl_openib_ini.c >> >> =================================================================== >> --- ompi/mca/btl/openib/btl_openib_ini.c (revision >> 19490) >> +++ ompi/mca/btl/openib/btl_openib_ini.c (working copy) >> @@ -90,8 +90,6 @@ static int parse_line(parsed_section_val >> static void reset_section(bool had_previous_value, >> parsed_section_values_t *s); >> static void reset_values(ompi_btl_openib_ini_values_t *v); >> static int save_section(parsed_section_values_t *s); >> -static int intify(char *string); >> -static int intify_list(char *str, uint32_t **values, int >> *len); >> static inline void show_help(const char *topic); >> >> >> @@ -364,14 +362,14 @@ static int parse_line(parsed_section_val >> all whitespace at the beginning and ending of the >> value. */ >> >> if (0 == strcasecmp(key_buffer, "vendor_id")) { >> - if (OMPI_SUCCESS != (ret = intify_list(value, >> &sv->vendor_ids, >> + if (OMPI_SUCCESS != (ret = >> ompi_btl_openib_ini_intify_list(value, &sv->vendor_ids, >> >> &sv->vendor_ids_len))) { >> return ret; >> } >> } >> >> else if (0 == strcasecmp(key_buffer, "vendor_part_id")) { >> - if (OMPI_SUCCESS != (ret = intify_list(value, >> &sv->vendor_part_ids, >> + if (OMPI_SUCCESS != (ret = >> ompi_btl_openib_ini_intify_list(value, &sv->vendor_part_ids, >> >> &sv->vendor_part_ids_len))) { >> return ret; >> } >> @@ -379,13 +377,13 @@ static int parse_line(parsed_section_val >> >> else if (0 == strcasecmp(key_buffer, "mtu")) { >> /* Single value */ >> - sv->values.mtu = (uint32_t) intify(value); >> + sv->values.mtu = (uint32_t) >> ompi_btl_openib_ini_intify(value); >> sv->values.mtu_set = true; >> } >> >> else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) { >> /* Single value */ >> - sv->values.use_eager_rdma = (uint32_t) intify(value); >> + sv->values.use_eager_rdma = (uint32_t) >> ompi_btl_openib_ini_intify(value); >> sv->values.use_eager_rdma_set = true; >> } >> >> @@ -547,7 +545,7 @@ static int save_section(parsed_section_v >> /* >> * Do string-to-integer conversion, for both hex and >> decimal numbers >> */ >> -static int intify(char *str) >> +int ompi_btl_openib_ini_intify(char *str) >> { >> while (isspace(*str)) { >> ++str; >> @@ -568,7 +566,7 @@ static int intify(char *str) >> /* >> * Take a comma-delimited list and infity them all >> */ >> -static int intify_list(char *value, uint32_t **values, int >> *len) >> +int ompi_btl_openib_ini_intify_list(char *value, uint32_t >> **values, int *len) >> { >> char *comma; >> char *str = value; >> @@ -584,7 +582,7 @@ static int intify_list(char *value, uint >> if (NULL == *values) { >> return OMPI_ERR_OUT_OF_RESOURCE; >> } >> - *values[0] = (uint32_t) intify(str); >> + *values[0] = (uint32_t) >> ompi_btl_openib_ini_intify(str); >> *len = 1; >> } else { >> /* If we found a comma, loop over all the values. Be a >> @@ -594,7 +592,7 @@ static int intify_list(char *value, uint >> do { >> *comma = '\0'; >> *values = realloc(*values, sizeof(uint32_t) * >> (*len + 2)); >> - (*values)[*len] = (int32_t) intify(str); >> + (*values)[*len] = (int32_t) >> ompi_btl_openib_ini_intify(str); >> ++(*len); >> str = comma + 1; >> comma = strchr(str, ','); >> @@ -602,7 +600,7 @@ static int intify_list(char *value, uint >> /* Get the last value (i.e., the value after the last >> comma, because it won't have been snarfed in the >> loop) */ >> - (*values)[*len] = (uint32_t) intify(str); >> + (*values)[*len] = (uint32_t) >> ompi_btl_openib_ini_intify(str); >> ++(*len); >> } >> >> Index: ompi/mca/btl/openib/btl_openib_ini.h >> >> =================================================================== >> --- ompi/mca/btl/openib/btl_openib_ini.h (revision >> 19490) >> +++ ompi/mca/btl/openib/btl_openib_ini.h (working copy) >> @@ -49,6 +49,9 @@ extern "C" { >> */ >> int ompi_btl_openib_ini_finalize(void); >> >> + int ompi_btl_openib_ini_intify(char *string); >> + int ompi_btl_openib_ini_intify_list(char *str, uint32_t >> **values, int *len); >> + >> #if defined(c_plusplus) || defined(__cplusplus) >> } >> #endif >> Index: ompi/mca/btl/openib/btl_openib_mca.c >> >> =================================================================== >> --- ompi/mca/btl/openib/btl_openib_mca.c (revision >> 19490) >> +++ ompi/mca/btl/openib/btl_openib_mca.c (working copy) >> @@ -27,6 +27,7 @@ >> #include "opal/mca/base/mca_base_param.h" >> #include "btl_openib.h" >> #include "btl_openib_mca.h" >> +#include "btl_openib_ini.h" >> >> /* >> * Local flags >> @@ -97,7 +98,7 @@ static inline int reg_int(const char* pa >> */ >> int btl_openib_register_mca_params(void) >> { >> - char *msg, *str; >> + char *msg, *str, *pkey; >> int ival, ival2, ret, tmp; >> >> ret = OMPI_SUCCESS; >> @@ -192,13 +193,15 @@ int btl_openib_register_mca_params(void) >> 0, &ival, REGINT_GE_ZERO)); >> mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival; >> >> - CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value" >> + CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value" >> "(must be > 0 and < 0xffff)", >> - 0, &ival, REGINT_GE_ZERO)); >> - if (ival > 0xffff) { >> + "0", &pkey, 0)); >> + mca_btl_openib_component.ib_pkey_val = >> ompi_btl_openib_ini_intify(pkey) & 0x7fff; >> + if (mca_btl_openib_component.ib_pkey_val > 0xffff || >> + mca_btl_openib_component.ib_pkey_val < 0) { >> ret = OMPI_ERR_BAD_PARAM; >> } >> - mca_btl_openib_component.ib_pkey_val = (uint32_t) ival; >> + free(pkey); >> >> CHECK(reg_int("ib_psn", "InfiniBand packet sequence >> starting >> number " >> "(must be >= 0)", >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <mailto:de...@open-mpi.org> >> <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> >> -- -- >> Pavel Shamis (Pasha) >> Mellanox Technologies LTD. >> >> >> > > -- > -- > Pavel Shamis (Pasha) > Mellanox Technologies LTD. > >