Pasha, With your patch and parameter suggestion, it works! So to be clear btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for 1.2.7?
Thanks again, Matt On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) < pa...@dev.mellanox.co.il> wrote: > Matt, > Can you please run " cat /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on > your d2-ib,d3-ib. > I would like to check the partition configuration. > > Ohh, BTW I see that the command line in previous email was wrong, > Please use follow command line (the parameter name should be > "btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts HEX/DEC > values): > /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl openib,self > -mca btl_openib_ib_pkey_val 0x8109 /cluster/pallas/x86_64-ib/IMB-MPI1 > > Ompi 1.2.6 version should work ok with this patch. > > Thanks, > Pasha > > Matt Burgess wrote: > >> Pasha, >> >> Thanks for the patch. Unfortunately, it doesn't seem like that fixed the >> problem. I realized earlier I didn't mention what version of OpenMPI I was >> trying - it's 1.2.6. <http://1.2.6.> Should I be trying 1.2.7 with this >> patch? >> >> Thanks, >> Matt >> >> 2008/10/7 Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il <mailto: >> pa...@dev.mellanox.co.il>> >> >> >> Matt, >> Can you please try attached patch ? I guess it will resolve this >> issue. >> >> Thanks, >> Pasha >> >> Matt Burgess wrote: >> >> Lenny, >> >> Thanks for the info. It doesn't seem to be be working still. >> My command line is: >> >> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl >> openib,self -mca btl_openib_of_pkey_val 33033 >> /cluster/pallas/x86_64-ib/IMB-MPI1 >> >> I don't have a "/sys/class/infiniband/mthca0/ports/1/pkeys/" >> but I do have "/sys/class/infiniband/mlx4_0/ports/1/pkeys/". >> It's contents are: >> >> 0 106 114 122 16 24 32 40 49 57 65 73 81 >> 9 98 >> 1 107 115 123 17 25 33 41 5 58 66 74 82 >> 90 99 >> 10 108 116 124 18 26 34 42 50 59 67 75 83 >> 91 100 109 117 125 19 27 35 43 51 6 68 >> 76 84 92 101 11 118 126 2 28 36 44 52 60 >> 69 77 85 93 102 110 119 127 20 29 37 45 >> 53 61 7 78 86 94 103 111 12 13 21 3 38 >> 46 54 62 70 79 87 95 104 112 120 14 22 >> 30 39 47 55 63 71 8 88 96 105 113 121 15 >> 23 31 4 48 56 64 72 80 89 97 >> We aren't using the opensm, but voltaire's SM on a 2012 switch. >> >> Thanks again, >> Matt >> >> >> On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky >> <lenny.verkhov...@gmail.com >> <mailto:lenny.verkhov...@gmail.com> >> <mailto:lenny.verkhov...@gmail.com >> <mailto:lenny.verkhov...@gmail.com>>> wrote: >> >> Hi Matt, >> >> It seems that the right way to do it is the fallowing: >> >> -mca btl openib,self -mca btl_openib_ib_pkey_val 33033 >> >> when the value is a decimal number of the pkey, in your case >> 0x8109 = 33033, and no need for btl_openib_ib_pkey_ix value. >> >> ex. >> mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca >> btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt >> LT (2) (size min max avg) 1 3.511429 3.511429 3.511429 >> >> if it's not working check cat >> /sys/class/infiniband/mthca0/ports/1/pkeys/* for pkeys ans SM, >> maybe it's a setup. >> >> Pasha is currently checking this issue. >> >> Best regards, >> >> Lenny. >> >> >> >> >> >> On 10/7/08, *Jeff Squyres* <jsquy...@cisco.com >> <mailto:jsquy...@cisco.com> >> <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>> wrote: >> >> FWIW, if this configuration is for all of your users, you >> might want to specify these MCA params in the default MCA >> param file, or the environment, ...etc. Just so that you >> don't have to specify it on every mpirun command line. >> >> See >> >> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params. >> >> >> >> On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote: >> >> Sorry, misunderstood the question, >> >> thanks for Pasha the right command line will be >> >> -mca btl openib,self -mca btl_openib_of_pkey_val 0x8109 >> -mca btl_openib_of_pkey_ix 1 >> >> ex. >> >> #mpirun -np 2 -H witch2,witch3 -mca btl openib,self >> -mca >> btl_openib_of_pkey_val 0x8001 -mca >> btl_openib_of_pkey_ix 1 >> ./mpi_p1_4_TRUNK -t lt >> LT (2) (size min max avg) 1 3.443480 3.443480 3.443480 >> >> >> Best regards >> >> Lenny. >> >> >> On 10/6/08, Jeff Squyres <jsquy...@cisco.com >> <mailto:jsquy...@cisco.com> >> <mailto:jsquy...@cisco.com >> >> <mailto:jsquy...@cisco.com>>> wrote: On Oct 5, 2008, at >> >> 1:22 PM, Lenny Verkhovsky wrote: >> >> you should probably use -mca tcp,self -mca >> btl_openib_if_include ib0.8109 >> >> >> Really? I thought we only took OpenFabrics device >> names >> in the openib_if_include MCA param...? It looks like >> ib0.8109 is an IPoIB device name. >> >> >> >> Lenny. >> >> >> >> On 10/3/08, Matt Burgess <burgess.m...@gmail.com >> <mailto:burgess.m...@gmail.com> >> <mailto:burgess.m...@gmail.com >> <mailto:burgess.m...@gmail.com>>> wrote: >> Hi, >> >> >> I'm trying to get openmpi working over openib >> partitions. >> On this cluster, the partition number is 0x109. The ib >> interfaces are pingable over the appropriate ib0.8109 >> interface: >> >> d2:/opt/openmpi-ib # ifconfig ib0.8109 >> ib0.8109 Link encap:UNSPEC HWaddr >> 80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00 >> inet addr:10.21.48.2 <http://10.21.48.2> >> <http://10.21.48.2> >> Bcast:10.21.255.255 <http://10.21.255.255> >> <http://10.21.255.255> >> Mask:255.255.0.0 <http://255.255.0.0> >> <http://255.255.0.0> >> >> inet6 addr: fe80::202:c902:26:ca01/64 >> Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:65520 >> Metric:1 >> RX packets:16811 errors:0 dropped:0 >> overruns:0 frame:0 >> TX packets:15848 errors:0 dropped:1 overruns:0 >> carrier:0 >> collisions:0 txqueuelen:256 >> RX bytes:102229428 (97.4 Mb) TX >> bytes:102324172 >> (97.5 Mb) >> >> >> I have tried the following: >> >> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile >> machinefile -mca btl openib,self -mca >> btl_openib_max_btls >> 1 -mca btl_openib_ib_pkey_val 0x8109 -mca >> btl_openib_ib_pkey_ix 1 >> /cluster/pallas/x86_64-ib/IMB-MPI1 >> >> but I just get a RETRY EXCEEDED ERROR. Is there a MCA >> parameter I am missing? >> >> I was successful using tcp only: >> >> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile >> machinefile -mca btl tcp,self -mca >> btl_openib_max_btls 1 >> -mca btl_openib_ib_pkey_val 0x8109 >> /cluster/pallas/x86_64-ib/IMB-MPI1 >> >> >> >> Thanks, >> Matt Burgess >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- Jeff Squyres >> Cisco Systems >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> -- Jeff Squyres >> Cisco Systems >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <mailto:de...@open-mpi.org> >> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> -- -- >> Pavel Shamis (Pasha) >> Mellanox Technologies LTD. >> >> >> Index: ompi/mca/btl/openib/btl_openib_component.c >> =================================================================== >> --- ompi/mca/btl/openib/btl_openib_component.c (revision 19490) >> +++ ompi/mca/btl/openib/btl_openib_component.c (working copy) >> @@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl >> goto dealloc_pd; >> } >> >> - ret = OMPI_SUCCESS; >> + ret = OMPI_SUCCESS; >> /* Note ports are 1 based hence j = 1 */ >> for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){ >> struct ibv_port_attr ib_port_attr; >> @@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl >> uint16_t pkey,j; >> for (j=0; j < hca->ib_dev_attr.max_pkeys; j++) { >> ibv_query_pkey(hca->ib_dev_context, i, j, &pkey); >> - pkey=ntohs(pkey); >> + pkey=ntohs(pkey) & 0x7fff; >> if(pkey == mca_btl_openib_component.ib_pkey_val){ >> ret = init_one_port(btl_list, hca, i, j, >> &ib_port_attr); >> break; >> Index: ompi/mca/btl/openib/btl_openib_ini.c >> =================================================================== >> --- ompi/mca/btl/openib/btl_openib_ini.c (revision 19490) >> +++ ompi/mca/btl/openib/btl_openib_ini.c (working copy) >> @@ -90,8 +90,6 @@ static int parse_line(parsed_section_val >> static void reset_section(bool had_previous_value, >> parsed_section_values_t *s); >> static void reset_values(ompi_btl_openib_ini_values_t *v); >> static int save_section(parsed_section_values_t *s); >> -static int intify(char *string); >> -static int intify_list(char *str, uint32_t **values, int *len); >> static inline void show_help(const char *topic); >> >> >> @@ -364,14 +362,14 @@ static int parse_line(parsed_section_val >> all whitespace at the beginning and ending of the value. */ >> >> if (0 == strcasecmp(key_buffer, "vendor_id")) { >> - if (OMPI_SUCCESS != (ret = intify_list(value, >> &sv->vendor_ids, >> + if (OMPI_SUCCESS != (ret = >> ompi_btl_openib_ini_intify_list(value, &sv->vendor_ids, >> &sv->vendor_ids_len))) >> { >> return ret; >> } >> } >> >> else if (0 == strcasecmp(key_buffer, "vendor_part_id")) { >> - if (OMPI_SUCCESS != (ret = intify_list(value, >> &sv->vendor_part_ids, >> + if (OMPI_SUCCESS != (ret = >> ompi_btl_openib_ini_intify_list(value, &sv->vendor_part_ids, >> >> &sv->vendor_part_ids_len))) { >> return ret; >> } >> @@ -379,13 +377,13 @@ static int parse_line(parsed_section_val >> >> else if (0 == strcasecmp(key_buffer, "mtu")) { >> /* Single value */ >> - sv->values.mtu = (uint32_t) intify(value); >> + sv->values.mtu = (uint32_t) >> ompi_btl_openib_ini_intify(value); >> sv->values.mtu_set = true; >> } >> >> else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) { >> /* Single value */ >> - sv->values.use_eager_rdma = (uint32_t) intify(value); >> + sv->values.use_eager_rdma = (uint32_t) >> ompi_btl_openib_ini_intify(value); >> sv->values.use_eager_rdma_set = true; >> } >> >> @@ -547,7 +545,7 @@ static int save_section(parsed_section_v >> /* >> * Do string-to-integer conversion, for both hex and decimal numbers >> */ >> -static int intify(char *str) >> +int ompi_btl_openib_ini_intify(char *str) >> { >> while (isspace(*str)) { >> ++str; >> @@ -568,7 +566,7 @@ static int intify(char *str) >> /* >> * Take a comma-delimited list and infity them all >> */ >> -static int intify_list(char *value, uint32_t **values, int *len) >> +int ompi_btl_openib_ini_intify_list(char *value, uint32_t >> **values, int *len) >> { >> char *comma; >> char *str = value; >> @@ -584,7 +582,7 @@ static int intify_list(char *value, uint >> if (NULL == *values) { >> return OMPI_ERR_OUT_OF_RESOURCE; >> } >> - *values[0] = (uint32_t) intify(str); >> + *values[0] = (uint32_t) ompi_btl_openib_ini_intify(str); >> *len = 1; >> } else { >> /* If we found a comma, loop over all the values. Be a >> @@ -594,7 +592,7 @@ static int intify_list(char *value, uint >> do { >> *comma = '\0'; >> *values = realloc(*values, sizeof(uint32_t) * (*len + 2)); >> - (*values)[*len] = (int32_t) intify(str); >> + (*values)[*len] = (int32_t) >> ompi_btl_openib_ini_intify(str); >> ++(*len); >> str = comma + 1; >> comma = strchr(str, ','); >> @@ -602,7 +600,7 @@ static int intify_list(char *value, uint >> /* Get the last value (i.e., the value after the last >> comma, because it won't have been snarfed in the >> loop) */ >> - (*values)[*len] = (uint32_t) intify(str); >> + (*values)[*len] = (uint32_t) ompi_btl_openib_ini_intify(str); >> ++(*len); >> } >> >> Index: ompi/mca/btl/openib/btl_openib_ini.h >> =================================================================== >> --- ompi/mca/btl/openib/btl_openib_ini.h (revision 19490) >> +++ ompi/mca/btl/openib/btl_openib_ini.h (working copy) >> @@ -49,6 +49,9 @@ extern "C" { >> */ >> int ompi_btl_openib_ini_finalize(void); >> >> + int ompi_btl_openib_ini_intify(char *string); >> + int ompi_btl_openib_ini_intify_list(char *str, uint32_t >> **values, int *len); >> + >> #if defined(c_plusplus) || defined(__cplusplus) >> } >> #endif >> Index: ompi/mca/btl/openib/btl_openib_mca.c >> =================================================================== >> --- ompi/mca/btl/openib/btl_openib_mca.c (revision 19490) >> +++ ompi/mca/btl/openib/btl_openib_mca.c (working copy) >> @@ -27,6 +27,7 @@ >> #include "opal/mca/base/mca_base_param.h" >> #include "btl_openib.h" >> #include "btl_openib_mca.h" >> +#include "btl_openib_ini.h" >> >> /* >> * Local flags >> @@ -97,7 +98,7 @@ static inline int reg_int(const char* pa >> */ >> int btl_openib_register_mca_params(void) >> { >> - char *msg, *str; >> + char *msg, *str, *pkey; >> int ival, ival2, ret, tmp; >> >> ret = OMPI_SUCCESS; >> @@ -192,13 +193,15 @@ int btl_openib_register_mca_params(void) >> 0, &ival, REGINT_GE_ZERO)); >> mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival; >> >> - CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value" >> + CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value" >> "(must be > 0 and < 0xffff)", >> - 0, &ival, REGINT_GE_ZERO)); >> - if (ival > 0xffff) { >> + "0", &pkey, 0)); >> + mca_btl_openib_component.ib_pkey_val = >> ompi_btl_openib_ini_intify(pkey) & 0x7fff; >> + if (mca_btl_openib_component.ib_pkey_val > 0xffff || >> + mca_btl_openib_component.ib_pkey_val < 0) { >> ret = OMPI_ERR_BAD_PARAM; >> } >> - mca_btl_openib_component.ib_pkey_val = (uint32_t) ival; >> + free(pkey); >> >> CHECK(reg_int("ib_psn", "InfiniBand packet sequence starting >> number " >> "(must be >= 0)", >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <mailto:de...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> > > -- > -- > Pavel Shamis (Pasha) > Mellanox Technologies LTD. > >