Re: [PATCH v2] ibsim: allocate mft according to number of switch ports
Sasha, It seems that you've missed this patch. Please apply. Thanks, Eli Eli Dorfman (Voltaire) wrote: allocate mft according to number of switch ports calculate number of port masks according to number of switch ports and allocate MFT accordingly Signed-off-by: Eli Dorfman e...@voltaire.com --- ibsim/sim.h |5 ++--- ibsim/sim_mad.c |5 +++-- ibsim/sim_net.c |5 +++-- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/ibsim/sim.h b/ibsim/sim.h index 30f5075..c4ee11f 100644 --- a/ibsim/sim.h +++ b/ibsim/sim.h @@ -45,9 +45,7 @@ #define MAXLINEARCAP (30*1024) #define MAXMCASTCAP 1024 #define LASTBLOCK32 (MAXMCASTCAP/32-1) -// NUMBEROFPORTMASK means that 32port switches could only be build -#define NUMBEROFPORTMASK 2 -#define LASTPORTMASK (NUMBEROFPORTMASK-1) +#define MCASTMASKSIZE16 // linkwidth == 4X - must be one width only 1,2 or 8 #define LINKWIDTH_1x1 #define LINKWIDTH_4x2 @@ -229,6 +227,7 @@ struct Switch { int linearFDBtop; int portchange; int lifetime; + int numportmask; uint8_t switchinfo[64]; Node *node; uint8_t *fdb; diff --git a/ibsim/sim_mad.c b/ibsim/sim_mad.c index 61d4866..3133bcf 100644 --- a/ibsim/sim_mad.c +++ b/ibsim/sim_mad.c @@ -513,18 +513,19 @@ static int do_multicastforwtbl(Port * port, unsigned op, uint32_t mod, int blockposition; Switch *sw = port-node-sw; + int lastportmask = sw-numportmask + 1; if (!sw)// not a Switch? return ERR_ATTR_UNSUPPORTED; VERB(requested : Block32 %d PortMask %d, numBlock32, numPortMsk); - if (numBlock32 LASTBLOCK32 || numPortMsk LASTPORTMASK) { + if (numBlock32 LASTBLOCK32 || numPortMsk lastportmask) { int8_t zeroblock[64] = { 0 }; mad_set_array(data, 0, IB_MULTICAST_FORW_TBL_F, zeroblock); return 0; } - blockposition = (numBlock32 * NUMBEROFPORTMASK + numPortMsk) * 64; + blockposition = (numBlock32 * sw-numportmask + numPortMsk) * 64; if (op == IB_MAD_METHOD_SET) mad_get_array(data, 0, IB_MULTICAST_FORW_TBL_F, sw-mfdb + blockposition); diff --git a/ibsim/sim_net.c b/ibsim/sim_net.c index 13c3b8c..f17f0b9 100644 --- a/ibsim/sim_net.c +++ b/ibsim/sim_net.c @@ -245,6 +245,7 @@ static Switch *new_switch(Node * nd, int set_esp0) sw-node = nd; sw-linearcap = maxlinearcap; // assume identical val for all switches sw-multicastcap = maxmcastcap; // assume identical val for all switches + sw-numportmask = (nd-numports + MCASTMASKSIZE) / MCASTMASKSIZE; memcpy(sw-switchinfo, switchinfo, sizeof(sw-switchinfo)); mad_set_field(sw-switchinfo, 0, IB_SW_LINEAR_FDB_CAP_F, sw-linearcap); mad_set_field(sw-switchinfo, 0, IB_SW_MCAST_FDB_CAP_F, @@ -253,13 +254,13 @@ static Switch *new_switch(Node * nd, int set_esp0) mad_set_field(sw-switchinfo, 0, IB_SW_ENHANCED_PORT0_F, set_esp0 0); sw-fdb = malloc(maxlinearcap*sizeof(sw-fdb[0])); - sw-mfdb = malloc(maxmcastcap*NUMBEROFPORTMASK*sizeof(uint16_t)); + sw-mfdb = malloc(maxmcastcap * sw-numportmask * sizeof(uint16_t)); if (!sw-fdb || !sw-mfdb) { IBPANIC(new_switch: no mem: %m); return NULL; } memset(sw-fdb, 0xff, maxlinearcap*sizeof(sw-fdb[0])); - memset(sw-mfdb, 0, maxmcastcap*NUMBEROFPORTMASK*sizeof(uint16_t)); + memset(sw-mfdb, 0, maxmcastcap * sw-numportmask * sizeof(uint16_t)); return sw; } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ibnetdiscover issue
Tom Ammon wrote: Eli, Is there a quick workaround we could put in place? I want to map out our fabric, and I especially need the spine GUIDs on the GD4200 because I'm going to be doing up/down routing and want to specify the root GUIDs. I can also submit a support case to Voltaire, if you think that would make it go faster. I want to make sure we are using OFED as distributed from OFA. Hi Tom, You should take the following patches: commit 5b034c90d72f9fbf382df414a1569e0a9f7ac679 Author: Eli Dorfman (Voltaire) dorfman@gmail.com Date: Sun Sep 5 11:30:37 2010 +0300 inifiband-diags: Support Voltaire switch ISR4200 Support Voltaire switch (ISR4200) grouping. Signed-off-by: Eli Dorfman e...@voltaire.com Signed-off-by: Sasha Khapyorsky sas...@voltaire.com commit e6a10b921627099f42bc4aea7ae68abcf4c1c970 Author: Eli Dorfman (Voltaire) dorfman@gmail.com Date: Sun Sep 5 11:28:57 2010 +0300 infiniband-diags: Do not exit when unexpected node found Show error message but do not exit when unexpected node is found. Signed-off-by: Eli Dorfman e...@voltaire.com Signed-off-by: Sasha Khapyorsky sas...@voltaire.com Eli Tom On 12/8/2010 11:28 AM, Hal Rosenstock wrote: Hi Tom, On 12/8/2010 12:48 PM, Tom Ammon wrote: Hi, I get the following when I try to run ibnetdiscover from a server plugged in to a voltaire 4036 switch. We're using OFED 1.5.2: [r...@sm1 ~]# ibnetdiscover src/chassis.c:535; Unexpected node found: guid 0x0008f1050075134c ibnetdiscover: iberror: failed: discover failed Looks to me like there's a missing is_spine_4200() clause missing in get_router_slot in libibnetdisc/src/chassis.c. Eli had added changes to support the 4200 so he's the best one to comment. -- Hal However, ibdiagnet runs fine: [r...@sm1 ~]# ibdiagnet Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.5.4 -W- Topology file is not specified. Reports regarding cluster links will use direct routes. Loading IBDM from: /usr/lib64/ibdm1.5.4 -I- Using port 1 as the local port. -I- Discovering ... 277 nodes (23 Switches 254 CA-s) discovered. -I--- -I- Bad Guids/LIDs Info -I--- -I- No bad Guids were found -I--- -I- Links With Logical State = INIT -I--- -I- No bad Links (with logical state = INIT) were found -I--- -I- General Device Info -I--- -I--- -I- PM Counters Info -I--- -W- lid=0x0007 guid=0x0008f105006515ba dev=23131 Port=33 Performance Monitor counter : Value link_error_recovery_counter : 0xff (overflow) -W- lid=0x0010 guid=0x0008f10500201d7c dev=23130 Port=14 Performance Monitor counter : Value symbol_error_counter : 0x (overflow) -W- lid=0x0001 guid=0x0008f10500108a76 dev=23130 Port=30 Performance Monitor counter : Value symbol_error_counter : 0x (overflow) -I--- -I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list) -I--- -I- PKey:0x7fff Hosts:254 full:254 limited:0 -I--- -I- IPoIB Subnets Check -I--- -I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00 -I--- -I- Bad Links Info -I- No bad link were found -I--- -I- Stages Status Report: STAGE Errors Warnings Bad GUIDs/LIDs Check 0 0 Link State Active Check 0 0 General Devices Info Report 0 0 Performance Counters Report 0 3 Partitions Check 0 0 IPoIB Subnets Check 0 0 Please see /tmp/ibdiagnet.log for complete log -I- Done. Run time was 21 seconds. Any ideas? Tom -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ibnetdiscover: fix error when discovery started on node with more that one port
When discovery is started on a node with more than one active port it tries to send mads from second Hca port and fails. ibnetdiscover should not try and extend_path from Hca except for starting point. Signed-off-by: Eli Dorfman e...@voltaire.com --- .../libibnetdisc/include/infiniband/ibnetdisc.h|1 + infiniband-diags/libibnetdisc/src/ibnetdisc.c |7 --- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h index cfd3bbe..fe953e6 100644 --- a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h +++ b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h @@ -149,6 +149,7 @@ typedef struct ibnd_fabric { * or by default the node you ar running on */ ibnd_node_t *from_node; + int from_port; /* NULL term list of all nodes in the fabric */ ibnd_node_t *nodes; /* NULL terminated list of all chassis found in the fabric */ diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c b/infiniband-diags/libibnetdisc/src/ibnetdisc.c index f525d71..64e1c60 100644 --- a/infiniband-diags/libibnetdisc/src/ibnetdisc.c +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c @@ -199,7 +199,7 @@ static int recv_port_info(smp_engine_t * engine, ibnd_smp_t * smp, if (port_num mad_get_field(port-info, 0, IB_PORT_PHYS_STATE_F) == IB_PORT_PHYS_STATE_LINKUP ((node-type == IB_NODE_SWITCH port_num != local_port) || - (node == fabric-from_node port_num == local_port))) { + (node == fabric-from_node port_num == fabric-from_port))) { ib_portid_t path = smp-path; if (extend_dpath(engine, path, port_num) 0) query_node_info(engine, path, node); @@ -324,9 +324,10 @@ static int recv_node_info(smp_engine_t * engine, ibnd_smp_t * smp, dump_endnode(smp-path, node_is_new ? new : known, node, port); - if (rem_node == NULL) /* this is the start node */ + if (rem_node == NULL) { /* this is the start node */ fabric-from_node = node; - else { + fabric-from_port = port_num; + } else { /* link ports... */ int rem_port_num = get_last_port(smp-path); -- 1.7.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] ibsim: allocate mft according to number of switch ports
allocate mft according to number of switch ports calculate number of port masks according to number of switch ports and allocate MFT accordingly Signed-off-by: Eli Dorfman e...@voltaire.com --- ibsim/sim.h |5 ++--- ibsim/sim_mad.c |5 +++-- ibsim/sim_net.c |5 +++-- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/ibsim/sim.h b/ibsim/sim.h index 30f5075..c4ee11f 100644 --- a/ibsim/sim.h +++ b/ibsim/sim.h @@ -45,9 +45,7 @@ #define MAXLINEARCAP (30*1024) #define MAXMCASTCAP1024 #define LASTBLOCK32(MAXMCASTCAP/32-1) -// NUMBEROFPORTMASK means that 32port switches could only be build -#defineNUMBEROFPORTMASK 2 -#define LASTPORTMASK (NUMBEROFPORTMASK-1) +#define MCASTMASKSIZE16 // linkwidth == 4X - must be one width only 1,2 or 8 #define LINKWIDTH_1x1 #define LINKWIDTH_4x2 @@ -229,6 +227,7 @@ struct Switch { int linearFDBtop; int portchange; int lifetime; + int numportmask; uint8_t switchinfo[64]; Node *node; uint8_t *fdb; diff --git a/ibsim/sim_mad.c b/ibsim/sim_mad.c index 61d4866..3133bcf 100644 --- a/ibsim/sim_mad.c +++ b/ibsim/sim_mad.c @@ -513,18 +513,19 @@ static int do_multicastforwtbl(Port * port, unsigned op, uint32_t mod, int blockposition; Switch *sw = port-node-sw; + int lastportmask = sw-numportmask + 1; if (!sw)// not a Switch? return ERR_ATTR_UNSUPPORTED; VERB(requested : Block32 %d PortMask %d, numBlock32, numPortMsk); - if (numBlock32 LASTBLOCK32 || numPortMsk LASTPORTMASK) { + if (numBlock32 LASTBLOCK32 || numPortMsk lastportmask) { int8_t zeroblock[64] = { 0 }; mad_set_array(data, 0, IB_MULTICAST_FORW_TBL_F, zeroblock); return 0; } - blockposition = (numBlock32 * NUMBEROFPORTMASK + numPortMsk) * 64; + blockposition = (numBlock32 * sw-numportmask + numPortMsk) * 64; if (op == IB_MAD_METHOD_SET) mad_get_array(data, 0, IB_MULTICAST_FORW_TBL_F, sw-mfdb + blockposition); diff --git a/ibsim/sim_net.c b/ibsim/sim_net.c index 13c3b8c..f17f0b9 100644 --- a/ibsim/sim_net.c +++ b/ibsim/sim_net.c @@ -245,6 +245,7 @@ static Switch *new_switch(Node * nd, int set_esp0) sw-node = nd; sw-linearcap = maxlinearcap; // assume identical val for all switches sw-multicastcap = maxmcastcap; // assume identical val for all switches + sw-numportmask = (nd-numports + MCASTMASKSIZE) / MCASTMASKSIZE; memcpy(sw-switchinfo, switchinfo, sizeof(sw-switchinfo)); mad_set_field(sw-switchinfo, 0, IB_SW_LINEAR_FDB_CAP_F, sw-linearcap); mad_set_field(sw-switchinfo, 0, IB_SW_MCAST_FDB_CAP_F, @@ -253,13 +254,13 @@ static Switch *new_switch(Node * nd, int set_esp0) mad_set_field(sw-switchinfo, 0, IB_SW_ENHANCED_PORT0_F, set_esp0 0); sw-fdb = malloc(maxlinearcap*sizeof(sw-fdb[0])); - sw-mfdb = malloc(maxmcastcap*NUMBEROFPORTMASK*sizeof(uint16_t)); + sw-mfdb = malloc(maxmcastcap * sw-numportmask * sizeof(uint16_t)); if (!sw-fdb || !sw-mfdb) { IBPANIC(new_switch: no mem: %m); return NULL; } memset(sw-fdb, 0xff, maxlinearcap*sizeof(sw-fdb[0])); - memset(sw-mfdb, 0, maxmcastcap*NUMBEROFPORTMASK*sizeof(uint16_t)); + memset(sw-mfdb, 0, maxmcastcap * sw-numportmask * sizeof(uint16_t)); return sw; } -- 1.7.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ibsim: support MFT for switches with up to 48 ports
support MFT for switches with up to 48 ports Signed-off-by: Eli Dorfman e...@voltaire.com --- ibsim/sim.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/ibsim/sim.h b/ibsim/sim.h index 5a8a92f..0de7fdc 100644 --- a/ibsim/sim.h +++ b/ibsim/sim.h @@ -45,8 +45,8 @@ #define MAXLINEARCAP (30*1024) #define MAXMCASTCAP1024 #define LASTBLOCK32(MAXMCASTCAP/32-1) -// NUMBEROFPORTMASK means that 32port switches could only be build -#defineNUMBEROFPORTMASK 2 +// NUMBEROFPORTMASK means that 48port switches could only be build +#defineNUMBEROFPORTMASK 3 #define LASTPORTMASK (NUMBEROFPORTMASK-1) // linkwidth == 4X - must be one width only 1,2 or 8 #define LINKWIDTH_1x1 -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ibsim: parse width and speed from ibnetdiscover output
Search for the pattern '[0-9]*x[SDQ]DR' Validate that width value is valid. Signed-off-by: Eli Dorfman e...@voltaire.com --- ibsim/sim_net.c | 48 1 files changed, 48 insertions(+), 0 deletions(-) diff --git a/ibsim/sim_net.c b/ibsim/sim_net.c index 13c3b8c..278c065 100644 --- a/ibsim/sim_net.c +++ b/ibsim/sim_net.c @@ -611,6 +611,7 @@ static int parse_port(char *line, Node * node, int type, int maxports) int portnum, isalias = 0; Port *port; char *s; + char *p; if (line[0] == '@') { isalias = 1; @@ -671,6 +672,7 @@ static int parse_port(char *line, Node * node, int type, int maxports) } parse_opt: line = s; + p = s; while (s (s = strchr(s + 1, '='))) { char *opt = s; while (opt !isalpha(*opt)) @@ -685,6 +687,52 @@ static int parse_port(char *line, Node * node, int type, int maxports) IBWARN(cannot parse lid, lmc); return -1; } + + /* parse width speed from '[0-9]*x[SDQ]DR' pattern */ + /* also checking for valid width value */ + char wstr[3]; + char sstr[3]; + s = p; + while (s (s = strchr(s + 1, 'x'))) { + int width; + p = s - 1; + while (p !isdigit(*p)) + p--; + width = atoi(p); + switch (width) { + case 1: strcpy(wstr, 1); + break; + case 4: strcpy(wstr, 2); + break; + case 8: strcpy(wstr, 4); + break; + case 12: strcpy(wstr, 8); + break; + default: wstr[0] = sstr[0] = 0; + continue; + } + p = p + 2; + if (!strncmp(SDR, p, 3)) + strcpy(sstr, 1); + else if (!strncmp(DDR, p, 3)) + strcpy(sstr, 2); + else if (!strncmp(QDR, p, 3)) + strcpy(sstr, 4); + else { + wstr[0] = sstr[0] = 0; + continue; + } + } + + if (parse_port_opt(port, w, wstr) 0) { + IBWARN(bad port option); + return -1; + } + + if (parse_port_opt(port, s, sstr) 0) { + IBWARN(bad port option); + return -1; + } return 1; } -- 1.7.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ibnetdiscover: add '-f' flag to show full information (ports' speed and width).
Sasha Khapyorsky wrote: On 09:45 Tue 24 Aug , Hal Rosenstock wrote: What about the flag? do we still need it if we pass the output after the comment? I wouldn't think so. I also think we've made commentary changes to the ibnetdiscover output format like this before. If we wanted to be absolutely sure it wouldn't break anything, we'd keep the flag though. It's up to Sasha. The '-f' flag could make sense since those information actually duplicates previous 4xDDR, etc. and used by ibsim's parser only. Also it would be nice to make ibsim's parser to understand both possibilities: s=2 and speed=2 (Or even to parse *x*DR strings :)) Is it valid to assume that *x*DR string is always the last token? If so, the patch should be changed Eli Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm: Fix sl2vl configuration
Hi Hal, Hal Rosenstock wrote: On Wed, Aug 25, 2010 at 4:43 AM, Eli Dorfman (Voltaire) dorfman@gmail.com wrote: Hal Rosenstock wrote: Hi Eli, On Mon, Aug 23, 2010 at 3:37 PM, Eli Dorfman dorfman@gmail.com wrote: Hi Hal, On Mon, Aug 23, 2010 at 5:25 PM, Hal Rosenstock hal.rosenst...@gmail.com wrote: On Wed, Jul 28, 2010 at 12:26 PM, Eli Dorfman (Voltaire) dorfman@gmail.com wrote: Subject: [PATCH] Fix sl2vl configuration For non-optimized sl2vl configuration in and out ports were reversed. Are you sure these are reversed ? Any idea which commit introduced this reversal of in and out ports ? I'm sure they are reversed. I looked at it some more and the ports look reversed to me too. This patch was also tested by Jim Schutt. That only means it works in his environment rather than correctness. It was tested in mine enviornment as well. Usually I test the patch in my environment to verify its correctness (in addition to code review). I assume you do the same. Do you expect me to test the patch in your environment as well? Huh ? I didn't check which commit introduced this - why is it important? I'd like to understand which patch introduced the reversal. I don't see it but might have missed it. I think it's important to know which versions are broken. I think that the following commit added the bug: commit 051a1dd514e63f3a971afad38e377932efca5e18 Author: Sasha Khapyorsky sas...@voltaire.com Date: Mon Jan 4 21:06:19 2010 +0200 opensm/osm_qos.c: split switch external and end ports setup This splits QoS related port parameters setup over switch external ports and end ports. Such separation leaves us with simpler code and saves some repeated flows in case of switch external ports (actually required per switch and not per port). Another advantages: Optimized Sl2VL Mapping procedure can be implemented easier using this model. A low level port QoS related parameters setup infrastructure is ready for supporting per port QoS related configuration (which hopefully will be implemented some days). I thought it might be that one but wasn't sure. Thanks. For optimal sl2vl added override of default ALL settting with port's sl2vl when operational VL was other than the default port. What is the motivation to override when the operational VLs is different ? Why is that better than what is done currently ? The idea was to apply the default policy - set sl2vl modulo operational VL. When applying ALL settings using port 1 we still want to override this setting for ports with different operational VL. What makes the default policy modulo operational VLs ? This is how its implemented in sl2vl_update_table() vl_mask = (1 (ib_port_info_get_op_vls(p-port_info) - 1)) - 1; for (i = 0; i IB_MAX_NUM_VLS / 2; i++) { vl1 = sl2vl_table-raw_vl_by_sl[i] 4; vl2 = sl2vl_table-raw_vl_by_sl[i] 0xf; if (vl1 != 15) vl1 = vl_mask; if (vl2 != 15) vl2 = vl_mask; Default startup switches configuration uses the same policy. I view this as further modification on the configuration based on the operational VLs. Besides, any VL specified above the op VLs is a drop (same as indicating VL15). This code is already in the trunk and my patch does not change that. The only diff in the patch is the fix of reversed in/out ports and corresponding fix for optimized sl2vl configuration. I think what you are doing is changing the default behavior and hence perhaps an additional policy switch is needed if this change really is needed and I'm unsure of that. I don't change the default behavior and this change is needed. Without it for example, if you connect a switch with opVL=4 (VL0-VL7) to a node that supports only VL0 and try to send traffic on slSL0 it will be dropped. Eli Is this really a policy issue ? IMO there are two separate issues in this patch and they should be in separate patches (for better git bisection). Maybe, but I still think this patch fixes wrong setting for sl2vl using optimized and non optimized methods. I'm not sure this should be divided to 2 separate patches. It's one thought per patch and to me this is two different thoughts. If this is the only issue, I can split this patch to 2 separate patches. It's also the issue noted above. -- Hal Eli Also, a couple of (possibly related) questions below. It seems that you are referring to patch v1 which was modified according to Jim's comments. Please check the v2 patch . I see my questions are moot in terms of the v2 patch. -- Hal Thanks, Eli Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_qos.c | 25 - 1 files changed, 20 insertions(+), 5 deletions(-) diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c index a571370..de0ae23 100644 --- a/opensm/opensm
Re: [PATCH] opensm: Fix sl2vl configuration
Hal Rosenstock wrote: Hi Eli, On Mon, Aug 23, 2010 at 3:37 PM, Eli Dorfman dorfman@gmail.com wrote: Hi Hal, On Mon, Aug 23, 2010 at 5:25 PM, Hal Rosenstock hal.rosenst...@gmail.com wrote: On Wed, Jul 28, 2010 at 12:26 PM, Eli Dorfman (Voltaire) dorfman@gmail.com wrote: Subject: [PATCH] Fix sl2vl configuration For non-optimized sl2vl configuration in and out ports were reversed. Are you sure these are reversed ? Any idea which commit introduced this reversal of in and out ports ? I'm sure they are reversed. I looked at it some more and the ports look reversed to me too. This patch was also tested by Jim Schutt. That only means it works in his environment rather than correctness. It was tested in mine enviornment as well. Usually I test the patch in my environment to verify its correctness (in addition to code review). I assume you do the same. Do you expect me to test the patch in your environment as well? I didn't check which commit introduced this - why is it important? I'd like to understand which patch introduced the reversal. I don't see it but might have missed it. I think it's important to know which versions are broken. I think that the following commit added the bug: commit 051a1dd514e63f3a971afad38e377932efca5e18 Author: Sasha Khapyorsky sas...@voltaire.com Date: Mon Jan 4 21:06:19 2010 +0200 opensm/osm_qos.c: split switch external and end ports setup This splits QoS related port parameters setup over switch external ports and end ports. Such separation leaves us with simpler code and saves some repeated flows in case of switch external ports (actually required per switch and not per port). Another advantages: Optimized Sl2VL Mapping procedure can be implemented easier using this model. A low level port QoS related parameters setup infrastructure is ready for supporting per port QoS related configuration (which hopefully will be implemented some days). For optimal sl2vl added override of default ALL settting with port's sl2vl when operational VL was other than the default port. What is the motivation to override when the operational VLs is different ? Why is that better than what is done currently ? The idea was to apply the default policy - set sl2vl modulo operational VL. When applying ALL settings using port 1 we still want to override this setting for ports with different operational VL. What makes the default policy modulo operational VLs ? This is how its implemented in sl2vl_update_table() vl_mask = (1 (ib_port_info_get_op_vls(p-port_info) - 1)) - 1; for (i = 0; i IB_MAX_NUM_VLS / 2; i++) { vl1 = sl2vl_table-raw_vl_by_sl[i] 4; vl2 = sl2vl_table-raw_vl_by_sl[i] 0xf; if (vl1 != 15) vl1 = vl_mask; if (vl2 != 15) vl2 = vl_mask; Default startup switches configuration uses the same policy. Is this really a policy issue ? IMO there are two separate issues in this patch and they should be in separate patches (for better git bisection). Maybe, but I still think this patch fixes wrong setting for sl2vl using optimized and non optimized methods. I'm not sure this should be divided to 2 separate patches. It's one thought per patch and to me this is two different thoughts. If this is the only issue, I can split this patch to 2 separate patches. Eli Also, a couple of (possibly related) questions below. It seems that you are referring to patch v1 which was modified according to Jim's comments. Please check the v2 patch . I see my questions are moot in terms of the v2 patch. -- Hal Thanks, Eli Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_qos.c | 25 - 1 files changed, 20 insertions(+), 5 deletions(-) diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c index a571370..de0ae23 100644 --- a/opensm/opensm/osm_qos.c +++ b/opensm/opensm/osm_qos.c @@ -182,7 +182,7 @@ static ib_api_status_t sl2vl_update_table(osm_sm_t * sm, osm_physp_t * p, tbl.raw_vl_by_sl[i] = (vl1 4) | vl2; } - if (!force_update (p_tbl = osm_physp_get_slvl_tbl(p, in_port)) + if (!force_update in_port (p_tbl = osm_physp_get_slvl_tbl(p, in_port)) !memcmp(p_tbl, tbl, sizeof(tbl))) return IB_SUCCESS; Why exclude port 0 here ? Is it related to the change noted below ? @@ -209,6 +209,7 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node, unsigned num_ports = osm_node_get_num_physp(node); int ret = 0; unsigned i, j; + uint8_t op_vl1; for (i = 1; i num_ports; i++) { p = osm_node_get_physp_ptr(node, i); @@ -225,17 +226,31 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node, if (ib_switch_info_get_opt_sl2vlmapping(node-sw-switch_info
Re: [PATCH] opensm: Added option for IPv4 MGID multiplexing.
This patch implements IPv4 MGID to MLID mapping in the same scheme as multicast IP to MAC mapping. Users may choose multicast IPs to use the same MLID or not. Slava Strebkov wrote: When option is enabled, same mlid may be assigned to multicast groups created by IPoIB IPv4. If there is no difference in 23 lsb bits of MGID, multicast group will got same mlid. Signed-off-by: Slava Strebkov sla...@voltaire.com --- opensm/include/opensm/osm_subnet.h |3 +- opensm/opensm/main.c |9 ++- opensm/opensm/osm_sa_mcmember_record.c | 42 +++- opensm/opensm/osm_subnet.c |9 ++- 4 files changed, 59 insertions(+), 4 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index 95a635c..505f8bd 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2010 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2010 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * Copyright (c) 2008 Xsigo Systems Inc. All rights reserved. @@ -231,6 +231,7 @@ typedef struct osm_subn_opt { char *prefix_routes_file; char *log_prefix; boolean_t consolidate_ipv6_snm_req; + boolean_t consolidate_ipv4; struct osm_subn_opt *file_opts; /* used for update */ uint8_t lash_start_vl; /* starting vl to use in lash */ uint8_t sm_sl; /* which SL to use for SM/SA communication */ diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 6e6c733..b476b98 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2010 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * Copyright (c) 2009 HNR Consulting. All rights reserved. @@ -328,6 +328,9 @@ static void show_usage(void) printf(--consolidate_ipv6_snm_req\n Use shared MLID for IPv6 Solicited Node Multicast groups\n per MGID scope and P_Key.\n\n); + printf(--consolidate_ipv4\n + Use shared MLID for IPv4 Multicast groups\n + per MGID scope and P_Key.\n\n); printf(--log_prefix prefix text\n Prefix to syslog messages from OpenSM.\n\n); printf(--verbose, -v\n @@ -610,6 +613,7 @@ int main(int argc, char *argv[]) #endif {prefix_routes_file, 1, NULL, 3}, {consolidate_ipv6_snm_req, 0, NULL, 4}, + {consolidate_ipv4, 0, NULL, 11}, {do_mesh_analysis, 0, NULL, 5}, {lash_start_vl, 1, NULL, 6}, {sm_sl, 1, NULL, 7}, @@ -974,6 +978,9 @@ int main(int argc, char *argv[]) case 4: opt.consolidate_ipv6_snm_req = TRUE; break; + case 11: + opt.consolidate_ipv4 = TRUE; + break; case 5: opt.do_mesh_analysis = TRUE; break; diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 93c2767..1ddbe55 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2010 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2009 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * Copyright (c) 2008 Xsigo Systems Inc. All rights reserved. @@ -122,6 +122,36 @@ static void free_mlid(IN osm_sa_t * sa, IN uint16_t mlid) #define PREFIX_SIGNATURE CL_HTON64(0xff10601bULL) #define INT_ID_MASK CL_HTON64(0xfff1ff00ULL) #define INT_ID_SIGNATURE CL_HTON64(0x0001ff00ULL) +#define IPV4_PREFIX_MASK CL_HTON64(0xff10) +#define IPV4_MUX_MASKCL_HTON32(0x007f) +#define PREFIX_SIGNATURE_IPV4 CL_HTON64(0xff10401bULL) + +static int compare_ipv4_mux_mgids(const void *m1, const void *m2) +{ + int res = memcmp(m1, m2, sizeof(ib_gid_t) - 4); + if (res) + return res; + uint32_t cmp_m1, cmp_m2; + cmp_m1 = *(uint32_t*)(((ib_gid_t*)m1)-raw[12]); + cmp_m2 = *(uint32_t*)(((ib_gid_t*)m2)-raw[12]); + cmp_m1 = IPV4_MUX_MASK; + cmp_m2 = IPV4_MUX_MASK; + return memcmp (cmp_m1, cmp_m2, sizeof(cmp_m1)); +} + +static ib_net16_t find_ipv4_mux_mlid(osm_subn_t *subn, ib_gid_t *mgid) +{ +
Re: [PATCH v2] opensm: Fix sl2vl configuration
Fix the patch according to Jim's comments Changed i,j to in,out to make code readable For non-optimal sl2vl configuration in and out ports were reversed. For optimal sl2vl added override of default ALL settting with port's sl2vl when operational VL was other than the default port. Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_qos.c | 35 +-- 1 files changed, 25 insertions(+), 10 deletions(-) diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c index a571370..0ebcfdf 100644 --- a/opensm/opensm/osm_qos.c +++ b/opensm/opensm/osm_qos.c @@ -208,10 +208,11 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node, unsigned force_update; unsigned num_ports = osm_node_get_num_physp(node); int ret = 0; - unsigned i, j; + unsigned in, out; + uint8_t op_vl1; - for (i = 1; i num_ports; i++) { - p = osm_node_get_physp_ptr(node, i); + for (out = 1; out num_ports; out++) { + p = osm_node_get_physp_ptr(node, out); force_update = p-need_update || sm-p_subn-need_update; p-vl_high_limit = qcfg-vl_high_limit; if (vlarb_update(sm, p, p-port_num, force_update, qcfg)) @@ -225,17 +226,31 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node, if (ib_switch_info_get_opt_sl2vlmapping(node-sw-switch_info) sm-p_subn-opt.use_optimized_slvl) { p = osm_node_get_physp_ptr(node, 1); + op_vl1 = ib_port_info_get_op_vls(p-port_info); force_update = p-need_update || sm-p_subn-need_update; - return sl2vl_update_table(sm, p, 1, 0x3, force_update, - qcfg-sl2vl); + if (sl2vl_update_table(sm, p, 0, 0x3, force_update, + qcfg-sl2vl)) + ret = -1; + /* overwrite default ALL configuration if port's + op_vl is different */ + for (out = 2; out num_ports; out++) { + p = osm_node_get_physp_ptr(node, out); + if (ib_port_info_get_op_vls(p-port_info) != op_vl1 + sl2vl_update_table(sm, p, 0, 0x2 | out, force_update, + qcfg-sl2vl)) + ret = -1; + } + return ret; } - for (i = 0; i num_ports; i++) { - p = osm_node_get_physp_ptr(node, i); + /* non optimized sl2vl configuration */ + out = ib_switch_info_is_enhanced_port0(node-sw-switch_info) ? 0 : 1; + for (; out num_ports; out++) { + p = osm_node_get_physp_ptr(node, out); force_update = p-need_update || sm-p_subn-need_update; - j = ib_switch_info_is_enhanced_port0(node-sw-switch_info) ? 0 : 1; - for (; j num_ports; j++) - if (sl2vl_update_table(sm, p, i, i 8 | j, + /* go over all in ports */ + for (in = 0; in num_ports; in++) + if (sl2vl_update_table(sm, p, in, in 8 | out, force_update, qcfg-sl2vl)) ret = -1; } -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: Fix sl2vl configuration
Subject: [PATCH] Fix sl2vl configuration For non-optimized sl2vl configuration in and out ports were reversed. For optimal sl2vl added override of default ALL settting with port's sl2vl when operational VL was other than the default port. Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_qos.c | 25 - 1 files changed, 20 insertions(+), 5 deletions(-) diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c index a571370..de0ae23 100644 --- a/opensm/opensm/osm_qos.c +++ b/opensm/opensm/osm_qos.c @@ -182,7 +182,7 @@ static ib_api_status_t sl2vl_update_table(osm_sm_t * sm, osm_physp_t * p, tbl.raw_vl_by_sl[i] = (vl1 4) | vl2; } - if (!force_update (p_tbl = osm_physp_get_slvl_tbl(p, in_port)) + if (!force_update in_port (p_tbl = osm_physp_get_slvl_tbl(p, in_port)) !memcmp(p_tbl, tbl, sizeof(tbl))) return IB_SUCCESS; @@ -209,6 +209,7 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node, unsigned num_ports = osm_node_get_num_physp(node); int ret = 0; unsigned i, j; + uint8_t op_vl1; for (i = 1; i num_ports; i++) { p = osm_node_get_physp_ptr(node, i); @@ -225,17 +226,31 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node, if (ib_switch_info_get_opt_sl2vlmapping(node-sw-switch_info) sm-p_subn-opt.use_optimized_slvl) { p = osm_node_get_physp_ptr(node, 1); + op_vl1 = ib_port_info_get_op_vls(p-port_info); force_update = p-need_update || sm-p_subn-need_update; - return sl2vl_update_table(sm, p, 1, 0x3, force_update, - qcfg-sl2vl); + if (sl2vl_update_table(sm, p, 0, 0x3, force_update, + qcfg-sl2vl)) + ret = -1; + /* overwrite default ALL configuration if port's + op_vl is different */ + for (i = 2; i num_ports; i++) { + p = osm_node_get_physp_ptr(node, i); + if (ib_port_info_get_op_vls(p-port_info) != op_vl1 + sl2vl_update_table(sm, p, 0, 0x2 | i, force_update, + qcfg-sl2vl)) + ret = -1; + } + return ret; } - for (i = 0; i num_ports; i++) { + /* non optimized sl2vl configuration */ + i = ib_switch_info_is_enhanced_port0(node-sw-switch_info) ? 0 : 1; + for (; i num_ports; i++) { p = osm_node_get_physp_ptr(node, i); force_update = p-need_update || sm-p_subn-need_update; j = ib_switch_info_is_enhanced_port0(node-sw-switch_info) ? 0 : 1; for (; j num_ports; j++) - if (sl2vl_update_table(sm, p, i, i 8 | j, + if (sl2vl_update_table(sm, p, j, j 8 | i, force_update, qcfg-sl2vl)) ret = -1; } -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: Modify OSM_LOG_SYS messages
Modify OSM_LOG_SYS messages Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_node_info_rcv.c |7 ++- opensm/opensm/osm_sa.c|2 +- opensm/opensm/osm_ucast_file.c|4 ++-- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/opensm/opensm/osm_node_info_rcv.c b/opensm/opensm/osm_node_info_rcv.c index b3e272c..d24e757 100644 --- a/opensm/opensm/osm_node_info_rcv.c +++ b/opensm/opensm/osm_node_info_rcv.c @@ -72,8 +72,8 @@ static void report_duplicated_guid(IN osm_sm_t * sm, osm_physp_t * p_physp, p_old = p_physp-p_remote_physp; p_new = osm_node_get_physp_ptr(p_neighbor_node, port_num); - OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 0D01: - Found duplicated node.\n + OSM_LOG(sm-p_log, OSM_LOG_SYS | OSM_LOG_ERROR, ERR 0D01: + Found duplicated node GUID.\n Node 0x% PRIx64 port %u is reachable from remote node 0x% PRIx64 port %u and remote node 0x% PRIx64 port %u.\n Paths are:\n, @@ -91,9 +91,6 @@ static void report_duplicated_guid(IN osm_sm_t * sm, osm_physp_t * p_physp, DR path with hop count %d couldn't be extended\n, path.hop_count); osm_dump_dr_path(sm-p_log, path, OSM_LOG_ERROR); - - osm_log(sm-p_log, OSM_LOG_SYS, - FATAL: duplicated guids or 12x lane reversal\n); } static void requery_dup_node_info(IN osm_sm_t * sm, osm_physp_t * p_physp, diff --git a/opensm/opensm/osm_sa.c b/opensm/opensm/osm_sa.c index 3473e4c..ef9872d 100644 --- a/opensm/opensm/osm_sa.c +++ b/opensm/opensm/osm_sa.c @@ -923,7 +923,7 @@ int osm_sa_db_file_load(osm_opensm_t * p_osm) file = fopen(file_name, r); if (!file) { OSM_LOG(p_osm-log, OSM_LOG_ERROR | OSM_LOG_SYS, ERR 4C02: - cannot open sa db file \'%s\'. Skip restoring\n, + Can't open sa db file \'%s\'. Skip restoring\n, file_name); return -1; } diff --git a/opensm/opensm/osm_ucast_file.c b/opensm/opensm/osm_ucast_file.c index ef79fc1..8cb3cb6 100644 --- a/opensm/opensm/osm_ucast_file.c +++ b/opensm/opensm/osm_ucast_file.c @@ -142,7 +142,7 @@ static int do_ucast_file_load(void *context) file = fopen(file_name, r); if (!file) { OSM_LOG(p_osm-log, OSM_LOG_ERROR | OSM_LOG_SYS, ERR 6302: - cannot open ucast dump file \'%s\': %m\n, file_name); + Can't open ucast dump file \'%s\': %m\n, file_name); return -1; } @@ -269,7 +269,7 @@ static int do_lid_matrix_file_load(void *context) file = fopen(file_name, r); if (!file) { OSM_LOG(p_osm-log, OSM_LOG_ERROR | OSM_LOG_SYS, ERR 6305: - cannot open lid matrix file \'%s\': %m\n, file_name); + Can't open lid matrix file \'%s\': %m\n, file_name); return -1; } -- 1.5.3.6 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: Fix wrong messages in MC delete flow
Fix wrong messages in MC delete and update flows. The requester GID was wrong. Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_sa_mcmember_record.c | 11 --- 1 files changed, 4 insertions(+), 7 deletions(-) diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 07aeb6c..eda2fdd 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -405,8 +405,7 @@ static boolean_t validate_modify(IN osm_sa_t * sa, IN osm_mgrp_t * p_mgrp, 0x%016 PRIx64 request:0x%016 PRIx64 \n, cl_ntoh64((*pp_mcm_port)-port_gid.unicast. interface_id), - cl_ntoh64(p_mad_addr-addr_type.gsi.grh_info. - src_gid.unicast.interface_id)); + cl_ntoh64(request_gid.unicast.interface_id)); return FALSE; } } else { @@ -422,11 +421,9 @@ static boolean_t validate_modify(IN osm_sa_t * sa, IN osm_mgrp_t * p_mgrp, p_request_physp)) { /* the request port is not part of the partition for this mgrp */ OSM_LOG(sa-p_log, OSM_LOG_DEBUG, - ProxyJoin but port not in partition. stored: - 0x%016 PRIx64 request:0x%016 PRIx64 \n, - cl_ntoh64((*pp_mcm_port)-port-guid), - cl_ntoh64(p_mad_addr-addr_type.gsi.grh_info. - src_gid.unicast.interface_id)); + Requesting port 0x%016 PRIx64 has no P_Key 0x%04x\n, + cl_ntoh64(p_request_physp-port_guid), + cl_ntoh16(p_mgrp-mcmember_rec.pkey)); return FALSE; } } -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] opensm: Modify connect_roots to allow connectivity between all switches
Yevgeny Kliteynik wrote: Eli, On 13-Jun-10 6:10 PM, Eli Dorfman (Voltaire) wrote: After a second thought and in order not to break current configuration, I send this modified patch that does not change connect_roots option but changes its functionality in up-down (I think that in fat-tree it is already implemented) Modify connect_roots option to allow connectivity between all switches in up-down routing algorithm and in this way be fully IBA compliant Signed-off-by: Eli Dorfmane...@voltaire.com --- opensm/man/opensm.8.in |2 +- opensm/opensm/main.c |2 +- opensm/opensm/osm_ucast_updn.c |4 +--- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index 9053611..c67126e 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -174,7 +174,7 @@ the host comes back online. .TP \fB\-z\fR, \fB\-\-connect_roots\fR This option enforces routing engines (up/down and -fat-tree) to make connectivity between root switches and in +fat-tree) to make connectivity between all switches and in this way to be fully IBA complaint. In many cases this can violate pure deadlock free algorithm, so use it carefully. .TP diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 0093aa7..82ca78f 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -187,7 +187,7 @@ static void show_usage(void) Sets the SL to use to communicate with the SM/SA. Defaults to 0.\n\n); printf(--connect_roots, -z\n This option enforces routing engines (up/down and \n - fat-tree) to make connectivity between root switches\n + fat-tree) to make connectivity between all switches\n and in this way be IBA compliant. In many cases,\n this can violate \pure\ deadlock free algorithm, so\n use it carefully.\n\n); diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 164c6f4..f44ca24 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -314,9 +314,7 @@ static int updn_set_min_hop_table(IN updn_t * p_updn) item = cl_qmap_next(item)) { p_sw = (osm_switch_t *)item; /* Clear Min Hop Table */ -if (p_subn-opt.connect_roots) -updn_clear_non_root_hops(p_updn, p_sw); -else +if (!p_subn-opt.connect_roots) osm_switch_clear_hops(p_sw); } What kind of testing was done for this? I have a strong feeling that it will break up/down. If the connect_roots option is on, you will not clear the lid matrix at all, and it will contain also the down/up routes. that is the idea. instead of leaving only up/dpwn routes we want to keep routes between all switches. the routes between host nodes will still be up/down. Eli -- Yevgeny -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] opensm: Modify connect_roots to allow connectivity between all switches
Yevgeny Kliteynik wrote: On 15-Jun-10 10:09 AM, Eli Dorfman (Voltaire) wrote: Yevgeny Kliteynik wrote: Eli, On 13-Jun-10 6:10 PM, Eli Dorfman (Voltaire) wrote: After a second thought and in order not to break current configuration, I send this modified patch that does not change connect_roots option but changes its functionality in up-down (I think that in fat-tree it is already implemented) Modify connect_roots option to allow connectivity between all switches in up-down routing algorithm and in this way be fully IBA compliant Signed-off-by: Eli Dorfmane...@voltaire.com --- opensm/man/opensm.8.in |2 +- opensm/opensm/main.c |2 +- opensm/opensm/osm_ucast_updn.c |4 +--- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index 9053611..c67126e 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -174,7 +174,7 @@ the host comes back online. .TP \fB\-z\fR, \fB\-\-connect_roots\fR This option enforces routing engines (up/down and -fat-tree) to make connectivity between root switches and in +fat-tree) to make connectivity between all switches and in this way to be fully IBA complaint. In many cases this can violate pure deadlock free algorithm, so use it carefully. .TP diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 0093aa7..82ca78f 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -187,7 +187,7 @@ static void show_usage(void) Sets the SL to use to communicate with the SM/SA. Defaults to 0.\n\n); printf(--connect_roots, -z\n This option enforces routing engines (up/down and \n - fat-tree) to make connectivity between root switches\n + fat-tree) to make connectivity between all switches\n and in this way be IBA compliant. In many cases,\n this can violate \pure\ deadlock free algorithm, so\n use it carefully.\n\n); diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 164c6f4..f44ca24 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -314,9 +314,7 @@ static int updn_set_min_hop_table(IN updn_t * p_updn) item = cl_qmap_next(item)) { p_sw = (osm_switch_t *)item; /* Clear Min Hop Table */ -if (p_subn-opt.connect_roots) -updn_clear_non_root_hops(p_updn, p_sw); -else +if (!p_subn-opt.connect_roots) osm_switch_clear_hops(p_sw); } What kind of testing was done for this? I have a strong feeling that it will break up/down. If the connect_roots option is on, you will not clear the lid matrix at all, and it will contain also the down/up routes. that is the idea. instead of leaving only up/dpwn routes we want to keep routes between all switches. the routes between host nodes will still be up/down. Lid matrix might contain down/up routes between many kinds of switches, not only between roots. This means that you might have down/up paths between leaf switches, even though you could have found up/down path for them. This also means that you might end up with down/up routes between HCAs that are connected to these leafs. ok I'll check this again. Thanks, Eli -- Yevgeny Eli -- Yevgeny -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: Allow connectivity between switches in up down and fat tree
Modify option connect_roots to connect_switches and allow connectivity between all switches (not only root nodes) in up down and fat tree routing algorithms Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/include/opensm/osm_subnet.h |6 +++--- opensm/man/opensm.8.in |6 +++--- opensm/opensm/main.c |8 opensm/opensm/osm_subnet.c | 10 +- opensm/opensm/osm_ucast_ftree.c|2 +- opensm/opensm/osm_ucast_updn.c |6 ++ 6 files changed, 18 insertions(+), 20 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index 95a635c..9ef4c9e 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -193,7 +193,7 @@ typedef struct osm_subn_opt { boolean_t sweep_on_trap; char *routing_engine_names; boolean_t use_ucast_cache; - boolean_t connect_roots; + boolean_t connect_switches; char *lid_matrix_dump_file; char *lfts_file; char *root_guid_file; @@ -388,8 +388,8 @@ typedef struct osm_subn_opt { * routing_engine_names * Name of routing engine(s) to use. * -* connect_roots -* The option which will enforce root to root connectivity with +* connect_switches +* The option which will enforce all switch connectivity with * up/down and fat-tree routing engines (even if this violates * pure deadlock free up/down or fat-tree algorithm) * diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index 9053611..e530320 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -18,7 +18,7 @@ opensm \- InfiniBand subnet manager and administration (SM/SA) [\-\-do_mesh_analysis] [\-\-lash_start_vl vl number] [\-A | \-\-ucast_cache] -[\-z | \-\-connect_roots] +[\-z | \-\-connect_switches] [\-M file name | \-\-lid_matrix_file file name] [\-U file name | \-\-lfts_file file name] [\-S | \-\-sadb_file file name] @@ -172,9 +172,9 @@ is host reboot, which otherwise would cause two full routing recalculations: one when the host goes down, and the other when the host comes back online. .TP -\fB\-z\fR, \fB\-\-connect_roots\fR +\fB\-z\fR, \fB\-\-connect_switches\fR This option enforces routing engines (up/down and -fat-tree) to make connectivity between root switches and in +fat-tree) to make connectivity between all switches and in this way to be fully IBA complaint. In many cases this can violate pure deadlock free algorithm, so use it carefully. .TP diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 0093aa7..95f4432 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -185,7 +185,7 @@ static void show_usage(void) Defaults to 0.\n); printf(--sm_sl sl number\n Sets the SL to use to communicate with the SM/SA. Defaults to 0.\n\n); - printf(--connect_roots, -z\n + printf(--connect_switches, -z\n This option enforces routing engines (up/down and \n fat-tree) to make connectivity between root switches\n and in this way be IBA compliant. In many cases,\n @@ -587,7 +587,7 @@ int main(int argc, char *argv[]) {smkey, 1, NULL, 'k'}, {routing_engine, 1, NULL, 'R'}, {ucast_cache, 0, NULL, 'A'}, - {connect_roots, 0, NULL, 'z'}, + {connect_switches, 0, NULL, 'z'}, {lid_matrix_file, 1, NULL, 'M'}, {lfts_file, 1, NULL, 'U'}, {sadb_file, 1, NULL, 'S'}, @@ -887,8 +887,8 @@ int main(int argc, char *argv[]) break; case 'z': - opt.connect_roots = TRUE; - printf( Connect roots option is on\n); + opt.connect_switches = TRUE; + printf( Connect switches option is on\n); break; case 'A': diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index d5c5ab2..054df50 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -330,7 +330,7 @@ static const opt_rec_t opt_tbl[] = { { port_profile_switch_nodes, OPT_OFFSET(port_profile_switch_nodes), opts_parse_boolean, NULL, 1 }, { sweep_on_trap, OPT_OFFSET(sweep_on_trap), opts_parse_boolean, NULL, 1 }, { routing_engine, OPT_OFFSET(routing_engine_names), opts_parse_charp, NULL, 0 }, - { connect_roots, OPT_OFFSET(connect_roots), opts_parse_boolean, NULL, 1 }, + { connect_switches, OPT_OFFSET(connect_switches), opts_parse_boolean, NULL, 1 }, { use_ucast_cache, OPT_OFFSET(use_ucast_cache), opts_parse_boolean, NULL, 0 }, { log_file, OPT_OFFSET(log_file), opts_parse_charp, NULL, 0 }, { log_max_size, OPT_OFFSET(log_max_size), opts_parse_uint32,
[PATCH v2] opensm: Modify connect_roots to allow connectivity between all switches
After a second thought and in order not to break current configuration, I send this modified patch that does not change connect_roots option but changes its functionality in up-down (I think that in fat-tree it is already implemented) Modify connect_roots option to allow connectivity between all switches in up-down routing algorithm and in this way be fully IBA compliant Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/man/opensm.8.in |2 +- opensm/opensm/main.c |2 +- opensm/opensm/osm_ucast_updn.c |4 +--- 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index 9053611..c67126e 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -174,7 +174,7 @@ the host comes back online. .TP \fB\-z\fR, \fB\-\-connect_roots\fR This option enforces routing engines (up/down and -fat-tree) to make connectivity between root switches and in +fat-tree) to make connectivity between all switches and in this way to be fully IBA complaint. In many cases this can violate pure deadlock free algorithm, so use it carefully. .TP diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 0093aa7..82ca78f 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -187,7 +187,7 @@ static void show_usage(void) Sets the SL to use to communicate with the SM/SA. Defaults to 0.\n\n); printf(--connect_roots, -z\n This option enforces routing engines (up/down and \n -fat-tree) to make connectivity between root switches\n +fat-tree) to make connectivity between all switches\n and in this way be IBA compliant. In many cases,\n this can violate \pure\ deadlock free algorithm, so\n use it carefully.\n\n); diff --git a/opensm/opensm/osm_ucast_updn.c b/opensm/opensm/osm_ucast_updn.c index 164c6f4..f44ca24 100644 --- a/opensm/opensm/osm_ucast_updn.c +++ b/opensm/opensm/osm_ucast_updn.c @@ -314,9 +314,7 @@ static int updn_set_min_hop_table(IN updn_t * p_updn) item = cl_qmap_next(item)) { p_sw = (osm_switch_t *)item; /* Clear Min Hop Table */ - if (p_subn-opt.connect_roots) - updn_clear_non_root_hops(p_updn, p_sw); - else + if (!p_subn-opt.connect_roots) osm_switch_clear_hops(p_sw); } -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] return no path when path does not exist
return OSM_NO_PATH (instead of port num) when path does not exists. this will also be reported as error in the log. Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_switch.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c index 311c4f7..b621852 100644 --- a/opensm/opensm/osm_switch.c +++ b/opensm/opensm/osm_switch.c @@ -628,6 +628,8 @@ uint8_t osm_switch_recommend_mcast_path(IN osm_switch_t * p_sw, a black hole that will destroy the Earth in a firey conflagration. */ least_hops = osm_switch_get_least_hops(p_sw, base_lid); + if (least_hops == OSM_NO_PATH) + return OSM_NO_PATH; for (port_num = 1; port_num num_ports; port_num++) if (osm_switch_get_hop_count(p_sw, base_lid, port_num) == least_hops) -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: Always set enabled speed and width
Initialize default link speed and width to No State Change and always set in SubnSet(PortInfo) Signed-off-by: Eli Dorfman e...@voltaire.com --- infiniband-diags/src/ibportstate.c | 17 - 1 files changed, 8 insertions(+), 9 deletions(-) diff --git a/infiniband-diags/src/ibportstate.c b/infiniband-diags/src/ibportstate.c index 66f069e..b99c0e1 100644 --- a/infiniband-diags/src/ibportstate.c +++ b/infiniband-diags/src/ibportstate.c @@ -64,13 +64,13 @@ enum port_ops { }; struct ibmad_port *srcport; -int speed = 15; -int width = 255; +int speed = 0; /* no state change */ +int width = 0; /* no state change */ int lid; int smlid; int lmc; int mtu; -int vls; +int vls = 0; /* no state change */ struct { const char *name; @@ -394,12 +394,11 @@ int main(int argc, char **argv) mad_set_field(data, 0, IB_PORT_STATE_F, 4); break; } - if (port_args[SPEED].set) - mad_set_field(data, 0, IB_PORT_LINK_SPEED_ENABLED_F, - speed); - if (port_args[WIDTH].set) - mad_set_field(data, 0, IB_PORT_LINK_WIDTH_ENABLED_F, - width); + + /* always set enabled speed/width - defaults to NOP */ + mad_set_field(data, 0, IB_PORT_LINK_SPEED_ENABLED_F, speed); + mad_set_field(data, 0, IB_PORT_LINK_WIDTH_ENABLED_F, width); + if (port_args[VLS].set) mad_set_field(data, 0, IB_PORT_OPER_VLS_F, vls); if (port_args[MTU].set) -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] opensm: support routing engine update
Slava Strebkov wrote: HI! That was my misunderstanding - upon heavy sweep SM will try to load routing engines as defined in the SM.conf file (ftree updn - in that order). So ftree will be loaded when switch comes back from reboot. Slava -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Yevgeny Kliteynik Sent: Monday, December 07, 2009 10:22 AM To: Slava Strebkov Cc: linux-rdma@vger.kernel.org Subject: Re: [PATCH v3] opensm: support routing engine update Slava, Slava Strebkov wrote: Hi Yevgeny, In that case SM will use updn and will not come back to ftree automatically. I think that this is a bad thing. I wouldn't want *temporary* change of fabric to cause *permanent* change of SM mode of operation. Such changes do happen, and I'd prefer SM to continue functioning in accordance to the user's configuration once the fabric is settled again. I do see the cases were the change that you propose is beneficial - if fabric topology doesn't fits the chosen routing, SM will waste time on retrying the wrong routing at every heavy sweep, but this happens due to suboptimal SM configuration and not as a result of some event that user has no control of. For every heavy sweep the SM will try to configure the routing engines as specified in the conf file. So when a switch goes up and ftree configuration is valid, the SM will configure ftree instead of updn. Eli -- Yevgeny Slava -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Yevgeny Kliteynik Sent: Sunday, December 06, 2009 6:03 PM To: Slava Strebkov Cc: linux-rdma@vger.kernel.org Subject: Re: [PATCH v3] opensm: support routing engine update Slava, Slava Strebkov wrote: setup routing engine when in use and delete when failed. setup routing engine before use. delete resources when routing algorithm fails. this will save allocation for routing algorithms that are not used. Suppose a user runs SM with ftree updn routings (in that order), and SM manages to route the fabric with ftree. At some point some switch reboots and causes ftree to fail and SM routes the fabric with updn. Does this mean that ftree will be removed from the list, and when the switch comes back, SM won't try ftree any more? -- Yevgeny Signed-off-by: Slava Strebkov sla...@voltaire.com --- opensm/include/opensm/osm_opensm.h |5 +++ opensm/opensm/osm_opensm.c | 57 +++- opensm/opensm/osm_subnet.c |7 - opensm/opensm/osm_ucast_mgr.c | 28 + 4 files changed, 88 insertions(+), 9 deletions(-) diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h index c121be4..ca0fddb 100644 --- a/opensm/include/opensm/osm_opensm.h +++ b/opensm/include/opensm/osm_opensm.h @@ -109,6 +109,7 @@ typedef enum _osm_routing_engine_type { } osm_routing_engine_type_t; /***/ +struct osm_opensm; /s* OpenSM: OpenSM/osm_routing_engine * NAME * struct osm_routing_engine @@ -122,6 +123,8 @@ typedef enum _osm_routing_engine_type { struct osm_routing_engine { const char *name; void *context; + int initialized; + int (*setup) (struct osm_routing_engine *re, struct osm_opensm *p_osm); int (*build_lid_matrices) (void *context); int (*ucast_build_fwd_tables) (void *context); void (*ucast_dump_tables) (void *context); @@ -183,6 +186,7 @@ typedef struct osm_opensm { cl_dispatcher_t disp; cl_plock_t lock; struct osm_routing_engine *routing_engine_list; + struct osm_routing_engine *last_routing_engine; osm_routing_engine_type_t routing_engine_used; osm_stats_t stats; osm_console_t console; @@ -522,6 +526,7 @@ extern volatile unsigned int osm_exit_flag; * DESCRIPTION * Set to one to cause all threads to leave */ +void osm_update_routing_engines(osm_opensm_t *osm, const char *engine_names); END_C_DECLS #endif /* _OSM_OPENSM_H_ */ diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c index 50d1349..f90584d 100644 --- a/opensm/opensm/osm_opensm.c +++ b/opensm/opensm/osm_opensm.c @@ -169,14 +169,7 @@ static void setup_routing_engine(osm_opensm_t *osm, const char *name) memset(re, 0, sizeof(struct osm_routing_engine)); re-name = m-name; - if (m-setup(re, osm)) { - OSM_LOG(osm-log, OSM_LOG_VERBOSE, - setup of routing -engine \'%s\' failed\n, name); - return; - } - OSM_LOG(osm-log, OSM_LOG_DEBUG, - \'%s\' routing engine set up\n, re-name); + re-setup = m-setup;
Re: [PATCH v2] opensm: bug in trap report for MC create(66) and delete(67) traps
Hal Rosenstock wrote: On Sun, Feb 7, 2010 at 4:47 AM, Eli Dorfman (Voltaire) dorfman@gmail.com wrote: Hal Rosenstock wrote: On Fri, Feb 5, 2010 at 9:18 AM, Eli Dorfman dorfman@gmail.com wrote: On Thu, Feb 4, 2010 at 10:52 PM, Hal Rosenstock hal.rosenst...@gmail.com wrote: On Thu, Feb 4, 2010 at 12:43 PM, Eli Dorfman (Voltaire) dorfman@gmail.com wrote: Subject: [PATCH] Wrong handling of MC create and delete traps For these traps the GID in the data details is the MGID and not the source port gid. So the SM should check that subscriber port has the pkey of the MC group. There was also an error in comparing the subnet prefix and guid due to host/network order mismatch. Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_inform.c | 151 --- 1 files changed, 98 insertions(+), 53 deletions(-) diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c index 8108213..ae4fe71 100644 --- a/opensm/opensm/osm_inform.c +++ b/opensm/opensm/osm_inform.c @@ -341,6 +341,103 @@ Exit: return status; } +static int is_access_permitted( osm_infr_t *p_infr_rec, + osm_infr_match_ctxt_t *p_infr_match ) +{ + cl_list_t *p_infr_to_remove_list = p_infr_match-p_remove_infr_list; + ib_inform_info_t *p_ii = (p_infr_rec-inform_record.inform_info); + ib_mad_notice_attr_t *p_ntc = p_infr_match-p_ntc; + uint16_t trap_num = cl_ntoh16(p_ntc-g_or_v.generic.trap_num); + osm_subn_t *p_subn = p_infr_rec-sa-p_subn; + osm_log_t *p_log = p_infr_rec-sa-p_log; + char gid_str[INET6_ADDRSTRLEN]; + osm_mgrp_t *p_mgrp; + ib_gid_t source_gid; + osm_port_t *p_src_port; + osm_port_t *p_dest_port; + + /* In case of GID_IN(64) or GID_OUT(65) traps the source gid + comparison should be done on the trap source (saved as the gid in the + data details field). + For traps MC_CREATE(66) or MC_DELETE(67) the data details gid is + the MGID. We need to check whether subscriber has the pky of typo pkey + the MC group. Shouldn't this be the subscriber has a compatible pkey with MC group ? The MC group has a full member PKey and the members can be full or limited. I accept the correction. Doesn't this require a code change for handling trap cases 66-67 ? I think that you referred to the comment since the code is handling this properly (in my opinion). I was referring to both the comment and the code since a port with a compatible limited pkey should be able to receive the reports for MC groups. I agree and I think that the code is handling this case properly. osm_physp_has_pkey() takes the 15 lower MGID pkey bits and checks whether it is the physp pkey table. Eli Sasha, can you please change this in the commit (only if there are not other remarks). Is that what you are asking Sasha to do (beyond the typos) ? I asked Sasha to fix only the typo in commit. BTW, there is no explicit reference in the IB spec for MC group create/delete trap (at least I didn't find it). Not sure what you mean by this. What didn't you find ? in the spec see o13-17.1.2 Yes, there appear to be some holes in the spec in terms of this and maybe more in this area (event forwarding) but the intent seems clear. -- Hal Thanks, Eli -- Hal + In all other cases the issuer gis is the trap source. typo ^^^ gid and this typo of course. Thanks, Eli -- Hal + */ + if (trap_num = 64 trap_num = 67 ) + /* The issuer of these traps is the SM so source_gid + is the gid saved on the data details */ + source_gid = p_ntc-data_details.ntc_64_67.gid; + else + source_gid = p_ntc-issuer_gid; + + p_dest_port = + cl_ptr_vector_get(p_subn-port_lid_tbl, + cl_ntoh16(p_infr_rec-report_addr.dest_lid)); + if (!p_dest_port) { + OSM_LOG(p_log, OSM_LOG_INFO, + Cannot find destination port with LID:%u\n, + cl_ntoh16(p_infr_rec-report_addr.dest_lid)); + goto Exit; + } + + switch (trap_num) { + case 66: + case 67: + p_mgrp = osm_get_mgrp_by_mgid(p_subn, source_gid); + if (!p_mgrp) { + OSM_LOG(p_log, OSM_LOG_INFO, + Cannot find MGID %s\n, + inet_ntop(AF_INET6, source_gid.raw, gid_str, sizeof gid_str)); + goto Exit; + } + + if (!osm_physp_has_pkey
Re: [PATCH v2] opensm: bug in trap report for MC create(66) and delete(67) traps
Hal Rosenstock wrote: On Fri, Feb 5, 2010 at 9:18 AM, Eli Dorfman dorfman@gmail.com wrote: On Thu, Feb 4, 2010 at 10:52 PM, Hal Rosenstock hal.rosenst...@gmail.com wrote: On Thu, Feb 4, 2010 at 12:43 PM, Eli Dorfman (Voltaire) dorfman@gmail.com wrote: Subject: [PATCH] Wrong handling of MC create and delete traps For these traps the GID in the data details is the MGID and not the source port gid. So the SM should check that subscriber port has the pkey of the MC group. There was also an error in comparing the subnet prefix and guid due to host/network order mismatch. Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_inform.c | 151 --- 1 files changed, 98 insertions(+), 53 deletions(-) diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c index 8108213..ae4fe71 100644 --- a/opensm/opensm/osm_inform.c +++ b/opensm/opensm/osm_inform.c @@ -341,6 +341,103 @@ Exit: return status; } +static int is_access_permitted( osm_infr_t *p_infr_rec, + osm_infr_match_ctxt_t *p_infr_match ) +{ + cl_list_t *p_infr_to_remove_list = p_infr_match-p_remove_infr_list; + ib_inform_info_t *p_ii = (p_infr_rec-inform_record.inform_info); + ib_mad_notice_attr_t *p_ntc = p_infr_match-p_ntc; + uint16_t trap_num = cl_ntoh16(p_ntc-g_or_v.generic.trap_num); + osm_subn_t *p_subn = p_infr_rec-sa-p_subn; + osm_log_t *p_log = p_infr_rec-sa-p_log; + char gid_str[INET6_ADDRSTRLEN]; + osm_mgrp_t *p_mgrp; + ib_gid_t source_gid; + osm_port_t *p_src_port; + osm_port_t *p_dest_port; + + /* In case of GID_IN(64) or GID_OUT(65) traps the source gid + comparison should be done on the trap source (saved as the gid in the + data details field). + For traps MC_CREATE(66) or MC_DELETE(67) the data details gid is + the MGID. We need to check whether subscriber has the pky of typo pkey + the MC group. Shouldn't this be the subscriber has a compatible pkey with MC group ? The MC group has a full member PKey and the members can be full or limited. I accept the correction. Doesn't this require a code change for handling trap cases 66-67 ? I think that you referred to the comment since the code is handling this properly (in my opinion). Sasha, can you please change this in the commit (only if there are not other remarks). Is that what you are asking Sasha to do (beyond the typos) ? I asked Sasha to fix only the typo in commit. BTW, there is no explicit reference in the IB spec for MC group create/delete trap (at least I didn't find it). Not sure what you mean by this. What didn't you find ? in the spec see o13-17.1.2 Thanks, Eli -- Hal + In all other cases the issuer gis is the trap source. typo ^^^ gid and this typo of course. Thanks, Eli -- Hal + */ + if (trap_num = 64 trap_num = 67 ) + /* The issuer of these traps is the SM so source_gid + is the gid saved on the data details */ + source_gid = p_ntc-data_details.ntc_64_67.gid; + else + source_gid = p_ntc-issuer_gid; + + p_dest_port = + cl_ptr_vector_get(p_subn-port_lid_tbl, + cl_ntoh16(p_infr_rec-report_addr.dest_lid)); + if (!p_dest_port) { + OSM_LOG(p_log, OSM_LOG_INFO, + Cannot find destination port with LID:%u\n, + cl_ntoh16(p_infr_rec-report_addr.dest_lid)); + goto Exit; + } + + switch (trap_num) { + case 66: + case 67: + p_mgrp = osm_get_mgrp_by_mgid(p_subn, source_gid); + if (!p_mgrp) { + OSM_LOG(p_log, OSM_LOG_INFO, + Cannot find MGID %s\n, + inet_ntop(AF_INET6, source_gid.raw, gid_str, sizeof gid_str)); + goto Exit; + } + + if (!osm_physp_has_pkey(p_log, + p_mgrp-mcmember_rec.pkey, + p_dest_port-p_physp)) { + OSM_LOG(p_log, OSM_LOG_INFO, + MGID %s and port GUID:0x%016 PRIx64 do not share same pkey\n, + inet_ntop(AF_INET6, source_gid.raw, gid_str, sizeof gid_str), + cl_ntoh64(p_dest_port-guid)); + goto Exit
[PATCH v2] opensm: bug in trap report for MC create(66) and delete(67) traps
Subject: [PATCH] Wrong handling of MC create and delete traps For these traps the GID in the data details is the MGID and not the source port gid. So the SM should check that subscriber port has the pkey of the MC group. There was also an error in comparing the subnet prefix and guid due to host/network order mismatch. Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_inform.c | 151 --- 1 files changed, 98 insertions(+), 53 deletions(-) diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c index 8108213..ae4fe71 100644 --- a/opensm/opensm/osm_inform.c +++ b/opensm/opensm/osm_inform.c @@ -341,6 +341,103 @@ Exit: return status; } +static int is_access_permitted( osm_infr_t *p_infr_rec, + osm_infr_match_ctxt_t *p_infr_match ) +{ + cl_list_t *p_infr_to_remove_list = p_infr_match-p_remove_infr_list; + ib_inform_info_t *p_ii = (p_infr_rec-inform_record.inform_info); + ib_mad_notice_attr_t *p_ntc = p_infr_match-p_ntc; + uint16_t trap_num = cl_ntoh16(p_ntc-g_or_v.generic.trap_num); + osm_subn_t *p_subn = p_infr_rec-sa-p_subn; + osm_log_t *p_log = p_infr_rec-sa-p_log; + char gid_str[INET6_ADDRSTRLEN]; + osm_mgrp_t *p_mgrp; + ib_gid_t source_gid; + osm_port_t *p_src_port; + osm_port_t *p_dest_port; + + /* In case of GID_IN(64) or GID_OUT(65) traps the source gid + comparison should be done on the trap source (saved as the gid in the + data details field). + For traps MC_CREATE(66) or MC_DELETE(67) the data details gid is + the MGID. We need to check whether subscriber has the pky of + the MC group. + In all other cases the issuer gis is the trap source. + */ + if (trap_num = 64 trap_num = 67 ) + /* The issuer of these traps is the SM so source_gid + is the gid saved on the data details */ + source_gid = p_ntc-data_details.ntc_64_67.gid; + else + source_gid = p_ntc-issuer_gid; + + p_dest_port = + cl_ptr_vector_get(p_subn-port_lid_tbl, + cl_ntoh16(p_infr_rec-report_addr.dest_lid)); + if (!p_dest_port) { + OSM_LOG(p_log, OSM_LOG_INFO, + Cannot find destination port with LID:%u\n, + cl_ntoh16(p_infr_rec-report_addr.dest_lid)); + goto Exit; + } + + switch (trap_num) { + case 66: + case 67: + p_mgrp = osm_get_mgrp_by_mgid(p_subn, source_gid); + if (!p_mgrp) { + OSM_LOG(p_log, OSM_LOG_INFO, + Cannot find MGID %s\n, + inet_ntop(AF_INET6, source_gid.raw, gid_str, sizeof gid_str)); + goto Exit; + } + + if (!osm_physp_has_pkey(p_log, + p_mgrp-mcmember_rec.pkey, + p_dest_port-p_physp)) { + OSM_LOG(p_log, OSM_LOG_INFO, + MGID %s and port GUID:0x%016 PRIx64 do not share same pkey\n, + inet_ntop(AF_INET6, source_gid.raw, gid_str, sizeof gid_str), + cl_ntoh64(p_dest_port-guid)); + goto Exit; + } + break; + + default: + p_src_port = + osm_get_port_by_guid(p_subn, source_gid.unicast.interface_id); + if (!p_src_port) { + OSM_LOG(p_log, OSM_LOG_INFO, + Cannot find source port with GUID:0x%016 PRIx64 \n, + cl_ntoh64(source_gid.unicast.interface_id)); + goto Exit; + } + + + /* Check if there is a pkey match. o13-17.1.1 */ + if (osm_port_share_pkey(p_log, p_src_port, p_dest_port) == FALSE) { + OSM_LOG(p_log, OSM_LOG_DEBUG, Mismatch by Pkey\n); + /* According to o13-17.1.2 - If this informInfo does not have + lid_range_begin of 0x, then this informInfo request + should be removed from database */ + if (p_ii-lid_range_begin != 0x) { + OSM_LOG(p_log, OSM_LOG_VERBOSE, + Pkey mismatch on lid_range_begin != 0x. + Need to remove
opensm: suggestion to define new common port_groups file
Currently there is a duplication of port group definition in OpenSM configuration files. Using a single configuration file for port groups will make it easier to manage and will allow using group name as keyword in other configuration files (i.e. partitions.conf, qos-policy.conf and opensm.conf) The suggested format for port-groups.conf is a simple list: GroupA: GUID1, GUID2, GUID3 GroupB: GUID4, GUID5, GUID6 All configuration files shall still support the old format (of specifying GUIDs explicitly) and will also recognize the new Groups. Following an example of partitions.conf file with new group definition: management=0x7fff,ipoib,defmember=full: GroupA=full, GroupB=limited, ALL_CAS=limited; and qos-policy.conf with the new group definition: port-groups port-group port-guid: GroupA, GroupB name: my.group end-port-group end-port-groups Thanks, Eli -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: bug in trap report for MC create(66) and delete(67) traps
In this case the GID in the data details is the MGID and not the source gid. So the SM gid should be taken as the source gid. There was also an error in comparing the subnet prefix and guid due to host/network order mismatch. Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_inform.c | 19 +-- 1 files changed, 9 insertions(+), 10 deletions(-) diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c index 8108213..5f48376 100644 --- a/opensm/opensm/osm_inform.c +++ b/opensm/opensm/osm_inform.c @@ -460,18 +460,16 @@ static void match_notice_to_inf_rec(IN cl_list_item_t * p_list_item, } } - /* Check if there is a pkey match. o13-17.1.1 */ - /* Check if the issuer of the trap is the SM. If it is, then the gid + /* In case of GID IN(64) or GID OUT(65) traps the source gid comparison should be done on the trap source (saved as the gid in the data details field). - If the issuer gid is not the SM - then it is the guid of the trap - source */ - if ((cl_ntoh64(p_ntc-issuer_gid.unicast.prefix) == -p_subn-opt.subnet_prefix) -(cl_ntoh64(p_ntc-issuer_gid.unicast.interface_id) == - p_subn-sm_port_guid)) - /* The issuer is the SM then this is trap 64-67 - compare the gid - with the gid saved on the data details */ + In all other cases the issuer gis is the trap source. + This is also the case for MC CREATE(66) and MC DELETE(67) where the + data details gid is MGID */ + if (p_ntc-g_or_v.generic.trap_num == 64 || + p_ntc-g_or_v.generic.trap_num == 65 ) + /* The issuer of these traps is the SM so source_gid + is the gid saved on the data details */ source_gid = p_ntc-data_details.ntc_64_67.gid; else source_gid = p_ntc-issuer_gid; @@ -495,6 +493,7 @@ static void match_notice_to_inf_rec(IN cl_list_item_t * p_list_item, goto Exit; } + /* Check if there is a pkey match. o13-17.1.1 */ if (osm_port_share_pkey(p_log, p_src_port, p_dest_port) == FALSE) { OSM_LOG(p_log, OSM_LOG_DEBUG, Mismatch by Pkey\n); /* According to o13-17.1.2 - If this informInfo does not have -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] opensm: Add update_desc command to opensm console
Sometimes nodes are discovered by the opensm with their default name before real host name was resolves and updated in the HCA. Since Mellanox ConnectX HCA do not support NodeDescription change trap (as part of trap 144) we need an option to refresh NodeDescription from console. Signed-off-by: Eli Dorfman e...@voltaire.com --- v2: Revert error (ERR 3319) message deletion opensm/opensm/osm_console.c | 16 opensm/opensm/osm_state_mgr.c | 53 +++-- 2 files changed, 56 insertions(+), 13 deletions(-) diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c index 206e7f7..e90cb99 100644 --- a/opensm/opensm/osm_console.c +++ b/opensm/opensm/osm_console.c @@ -56,6 +56,8 @@ #include opensm/osm_perfmgr.h #include opensm/osm_subnet.h +extern void osm_update_node_desc(IN osm_sm_t *sm); + struct command { char *name; void (*help_function) (FILE * out, int detail); @@ -207,6 +209,14 @@ static void help_dump_conf(FILE *out, int detail) } } +static void help_update_desc(FILE *out, int detail) +{ + fprintf(out, update_desc\n); + if (detail) { + fprintf(out, update node description for all nodes\n); + } +} + #ifdef ENABLE_OSM_PERF_MGR static void help_perfmgr(FILE * out, int detail) { @@ -1134,6 +1144,11 @@ static void dump_conf_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) osm_subn_output_conf(out, p_osm-subn.opt); } +static void update_desc_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) +{ + osm_update_node_desc(p_osm-sm); +} + #ifdef ENABLE_OSM_PERF_MGR static void perfmgr_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) { @@ -1326,6 +1341,7 @@ static const struct command console_cmds[] = { {switchbalance, help_switchbalance, switchbalance_parse}, {lidbalance, help_lidbalance, lidbalance_parse}, {dump_conf, help_dump_conf, dump_conf_parse}, + {update_desc, help_update_desc, update_desc_parse}, {version, help_version, version_parse}, #ifdef ENABLE_OSM_PERF_MGR {perfmgr, help_perfmgr, perfmgr_parse}, diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 6296f21..86aaff2 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -510,11 +510,7 @@ static void query_sm_info(cl_map_item_t * item, void *cxt) ib_get_err_str(ret)); } -/** - During a light sweep, check each node to see if the node description - is valid and if not issue a ND query. -**/ -static void state_mgr_get_node_desc(IN cl_map_item_t * obj, IN void *context) +static void state_mgr_update_node_desc(IN cl_map_item_t * obj, IN void *context) { osm_madw_context_t mad_context; osm_node_t *p_node = (osm_node_t *) obj; @@ -527,14 +523,8 @@ static void state_mgr_get_node_desc(IN cl_map_item_t * obj, IN void *context) CL_ASSERT(p_node); - if (p_node-print_desc -strcmp(p_node-print_desc, OSM_NODE_DESC_UNKNOWN)) - /* if ND is valid, do nothing */ - goto exit; - - OSM_LOG(sm-p_log, OSM_LOG_ERROR, - ERR 3319: Unknown node description for node GUID - 0x%016 PRIx64 . Reissuing ND query\n, + OSM_LOG(sm-p_log, OSM_LOG_DEBUG, + Updating NodeDesc for 0x%016 PRIx64 \n, cl_ntoh64(osm_node_get_node_guid(p_node))); /* get a physp to request from. */ @@ -563,6 +553,43 @@ exit: OSM_LOG_EXIT(sm-p_log); } +void osm_update_node_desc(IN osm_sm_t *sm) +{ + CL_PLOCK_ACQUIRE(sm-p_lock); + cl_qmap_apply_func(sm-p_subn-node_guid_tbl, state_mgr_update_node_desc, + sm); + CL_PLOCK_RELEASE(sm-p_lock); +} + +/** + During a light sweep, check each node to see if the node description + is valid and if not issue a ND query. +**/ +static void state_mgr_get_node_desc(IN cl_map_item_t * obj, IN void *context) +{ + osm_node_t *p_node = (osm_node_t *) obj; + osm_sm_t *sm = context; + + OSM_LOG_ENTER(sm-p_log); + + CL_ASSERT(p_node); + + if (p_node-print_desc +strcmp(p_node-print_desc, OSM_NODE_DESC_UNKNOWN)) + /* if ND is valid, do nothing */ + goto exit; + + OSM_LOG(sm-p_log, OSM_LOG_ERROR, + ERR 3319: Unknown node description for node GUID + 0x%016 PRIx64 . Reissuing ND query\n, + cl_ntoh64(osm_node_get_node_guid(p_node))); + + state_mgr_update_node_desc(obj, context); + +exit: + OSM_LOG_EXIT(sm-p_log); +} + /**
[PATCH] opensm: Add update_desc command to opensm console
Add update_desc command to opensm console Sometimes nodes are discovered by the opensm with their default name before real host name was resolves and updated in the HCA. Since Mellanox ConnectX HCA do not support NodeDescription change trap (as part of trap 144) we need an option to refresh NodeDescription from console. Signed-off-by: Eli Dorfman e...@voltaire.com --- opensm/opensm/osm_console.c | 16 + opensm/opensm/osm_state_mgr.c | 48 +--- 2 files changed, 51 insertions(+), 13 deletions(-) diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c index 206e7f7..e90cb99 100644 --- a/opensm/opensm/osm_console.c +++ b/opensm/opensm/osm_console.c @@ -56,6 +56,8 @@ #include opensm/osm_perfmgr.h #include opensm/osm_subnet.h +extern void osm_update_node_desc(IN osm_sm_t *sm); + struct command { char *name; void (*help_function) (FILE * out, int detail); @@ -207,6 +209,14 @@ static void help_dump_conf(FILE *out, int detail) } } +static void help_update_desc(FILE *out, int detail) +{ + fprintf(out, update_desc\n); + if (detail) { + fprintf(out, update node description for all nodes\n); + } +} + #ifdef ENABLE_OSM_PERF_MGR static void help_perfmgr(FILE * out, int detail) { @@ -1134,6 +1144,11 @@ static void dump_conf_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) osm_subn_output_conf(out, p_osm-subn.opt); } +static void update_desc_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) +{ + osm_update_node_desc(p_osm-sm); +} + #ifdef ENABLE_OSM_PERF_MGR static void perfmgr_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) { @@ -1326,6 +1341,7 @@ static const struct command console_cmds[] = { {switchbalance, help_switchbalance, switchbalance_parse}, {lidbalance, help_lidbalance, lidbalance_parse}, {dump_conf, help_dump_conf, dump_conf_parse}, + {update_desc, help_update_desc, update_desc_parse}, {version, help_version, version_parse}, #ifdef ENABLE_OSM_PERF_MGR {perfmgr, help_perfmgr, perfmgr_parse}, diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 6296f21..01dfba6 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -510,11 +510,7 @@ static void query_sm_info(cl_map_item_t * item, void *cxt) ib_get_err_str(ret)); } -/** - During a light sweep, check each node to see if the node description - is valid and if not issue a ND query. -**/ -static void state_mgr_get_node_desc(IN cl_map_item_t * obj, IN void *context) +static void state_mgr_update_node_desc(IN cl_map_item_t * obj, IN void *context) { osm_madw_context_t mad_context; osm_node_t *p_node = (osm_node_t *) obj; @@ -527,14 +523,8 @@ static void state_mgr_get_node_desc(IN cl_map_item_t * obj, IN void *context) CL_ASSERT(p_node); - if (p_node-print_desc -strcmp(p_node-print_desc, OSM_NODE_DESC_UNKNOWN)) - /* if ND is valid, do nothing */ - goto exit; - - OSM_LOG(sm-p_log, OSM_LOG_ERROR, - ERR 3319: Unknown node description for node GUID - 0x%016 PRIx64 . Reissuing ND query\n, + OSM_LOG(sm-p_log, OSM_LOG_DEBUG, + Updating NodeDesc for 0x%016 PRIx64 \n, cl_ntoh64(osm_node_get_node_guid(p_node))); /* get a physp to request from. */ @@ -563,6 +553,38 @@ exit: OSM_LOG_EXIT(sm-p_log); } +void osm_update_node_desc(IN osm_sm_t *sm) +{ + CL_PLOCK_ACQUIRE(sm-p_lock); + cl_qmap_apply_func(sm-p_subn-node_guid_tbl, state_mgr_update_node_desc, + sm); + CL_PLOCK_RELEASE(sm-p_lock); +} + +/** + During a light sweep, check each node to see if the node description + is valid and if not issue a ND query. +**/ +static void state_mgr_get_node_desc(IN cl_map_item_t * obj, IN void *context) +{ + osm_node_t *p_node = (osm_node_t *) obj; + osm_sm_t *sm = context; + + OSM_LOG_ENTER(sm-p_log); + + CL_ASSERT(p_node); + + if (p_node-print_desc +strcmp(p_node-print_desc, OSM_NODE_DESC_UNKNOWN)) + /* if ND is valid, do nothing */ + goto exit; + + state_mgr_update_node_desc(obj, context); + +exit: + OSM_LOG_EXIT(sm-p_log); +} + /** Initiates a lightweight sweep of the subnet. Used during normal sweeps after the subnet is up. -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More
Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
Yevgeny Kliteynik wrote: Eli Dorfman (Voltaire) wrote: Yevgeny Kliteynik wrote: Eli Dorfman (Voltaire) wrote: Yevgeny Kliteynik wrote: Eli Dorfman (Voltaire) wrote: Yevgeny Kliteynik wrote: Yevgeny Kliteynik wrote: Line Holen wrote: On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote: Line Holen wrote: On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote: Sasha Khapyorsky wrote: On 12:26 Tue 03 Nov , Yevgeny Kliteynik wrote: Always do heavy sweep when there is only one node in the fabric, and this node is a switch, and SM runs on top of it - there may be a race when OSM starts running before the external ports are ports are up, or if they went through reset while SM was starting. In this race switch brings up the ports and turns on the PSC bit, but OSM might get PortInfo before SwitchInfo, and it might see all ports as down, but PSC bit on. If that happens, OSM turns off PSC bit, and it will never see external ports again - it won't perform any heavy sweep, only light sweep Could such race happen when there are more than one node in a fabric? I think that my description of the race was misleading. The race can happen on *any* fabric when SM runs on switch. But when it does happen, SM thinks that the whole subnet is just one switch - that's what it managed to discover. I've actually seen it happening. So the patch fixes this particular case. So the next question that you would probably ask is can this race happen on some *other* switch and not the one SM is running on? Well, I don't know. I have a hunch that it can't, but I couldn't prove it to myself yet. The race on the managed switch is a special case because SM always sees port 0, and always gets responses to its SMP queries. On any other switch, if the ports were reset, SM won't get any response until the ports are up again. Perhaps there might be a case where SM got some port as down, and by the time SM got SwitchInfo with PSC bit the port was already up, so SM won't start discovery beyond this port. But this race would be fixed on the next heavy sweep, when SM will discover this port that it missed the previous time, whereas race on managed switch is fatal - SM won't ever do any heavy sweep. -- Yevgeny At least for the 3.2 branch there is a general race regardless of where the SM is running. I haven't checked the current master, but I cannot recall seeing any patches related to this so I assume the race is still there. There is a window between SM discovering a switch and clearing PSC for the same switch. The SM will not detect a state change on the switch ports during this time. If the port changes state during that period, the switch issues new trap 128, which (I think) should cause SM to re-discover the fabric once this discovery cycle is over. Is this correct? I think the switch shall send a trap whenever it sets the PSC bit. Once set I believe it will not send another trap until it is reset. Or do I misinterpret the spec ? I may be wrong, but I thought that this is how things work: - port state changes - switch turns on PSC bit and starts sending traps - SM gets the trap, sends trap repress - switch gets trap repress and stops sending traps - PSC is still on - port state changes again (the same or any other port) - switch turns on PSC bit (which doesn't matter as PSC is already on) and starts sending traps again - etc... Anyway, I'll double-check this issue. Yep, verified. Switch sends traps regardless the PSC bit status. Also, the spec doesn't link them together: o14-5.1.1: If a switch supports Traps (PortInfo: CapabilityMask.IsTrap-Supported is one), its SMA shall send trap 128 to the SM indicated by the PortInfo:MasterSMLID under any condition that would cause SwitchInfo:PortStateChange to be set to one. (See 14.2.5.4 SwitchInfo on page 827.) Trap will be sent according to the SMLID. After first bring up the SMLID is not set yet and trap will not be sent. In that case the opensm would discover the change only by PSC bit. For IS3 chips the PSC bit and/or trap were set only after one or more ports changed their state, so I don't understand how can the SM discover PSC bit set while all ports are down. Or is this a change in IS4? It can happen when SM runs on the switch, not not host. In this case if all ports are going down, SM will see them all down and it will see PSC bit on. So this patch is only for SM running on a switch which is the only node in the fabric? I don't see the race when there is more than one switch - please explain. Quoting from above: The race can happen on *any* fabric when SM runs on switch. But when it does happen, SM thinks that the whole subnet is just one switch - that's what it managed to discover. I saw that but I don't understand how this can happen. If PSC bit is set after *every* port state change and SM clears PSC bit before reading PortInfo from the switch, osm_node_info_rcv.c
Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
Yevgeny Kliteynik wrote: Eli Dorfman (Voltaire) wrote: Yevgeny Kliteynik wrote: Yevgeny Kliteynik wrote: Line Holen wrote: On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote: Line Holen wrote: On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote: Sasha Khapyorsky wrote: On 12:26 Tue 03 Nov , Yevgeny Kliteynik wrote: Always do heavy sweep when there is only one node in the fabric, and this node is a switch, and SM runs on top of it - there may be a race when OSM starts running before the external ports are ports are up, or if they went through reset while SM was starting. In this race switch brings up the ports and turns on the PSC bit, but OSM might get PortInfo before SwitchInfo, and it might see all ports as down, but PSC bit on. If that happens, OSM turns off PSC bit, and it will never see external ports again - it won't perform any heavy sweep, only light sweep Could such race happen when there are more than one node in a fabric? I think that my description of the race was misleading. The race can happen on *any* fabric when SM runs on switch. But when it does happen, SM thinks that the whole subnet is just one switch - that's what it managed to discover. I've actually seen it happening. So the patch fixes this particular case. So the next question that you would probably ask is can this race happen on some *other* switch and not the one SM is running on? Well, I don't know. I have a hunch that it can't, but I couldn't prove it to myself yet. The race on the managed switch is a special case because SM always sees port 0, and always gets responses to its SMP queries. On any other switch, if the ports were reset, SM won't get any response until the ports are up again. Perhaps there might be a case where SM got some port as down, and by the time SM got SwitchInfo with PSC bit the port was already up, so SM won't start discovery beyond this port. But this race would be fixed on the next heavy sweep, when SM will discover this port that it missed the previous time, whereas race on managed switch is fatal - SM won't ever do any heavy sweep. -- Yevgeny At least for the 3.2 branch there is a general race regardless of where the SM is running. I haven't checked the current master, but I cannot recall seeing any patches related to this so I assume the race is still there. There is a window between SM discovering a switch and clearing PSC for the same switch. The SM will not detect a state change on the switch ports during this time. If the port changes state during that period, the switch issues new trap 128, which (I think) should cause SM to re-discover the fabric once this discovery cycle is over. Is this correct? I think the switch shall send a trap whenever it sets the PSC bit. Once set I believe it will not send another trap until it is reset. Or do I misinterpret the spec ? I may be wrong, but I thought that this is how things work: - port state changes - switch turns on PSC bit and starts sending traps - SM gets the trap, sends trap repress - switch gets trap repress and stops sending traps - PSC is still on - port state changes again (the same or any other port) - switch turns on PSC bit (which doesn't matter as PSC is already on) and starts sending traps again - etc... Anyway, I'll double-check this issue. Yep, verified. Switch sends traps regardless the PSC bit status. Also, the spec doesn't link them together: o14-5.1.1: If a switch supports Traps (PortInfo: CapabilityMask.IsTrap-Supported is one), its SMA shall send trap 128 to the SM indicated by the PortInfo:MasterSMLID under any condition that would cause SwitchInfo:PortStateChange to be set to one. (See 14.2.5.4 SwitchInfo on page 827.) Trap will be sent according to the SMLID. After first bring up the SMLID is not set yet and trap will not be sent. In that case the opensm would discover the change only by PSC bit. For IS3 chips the PSC bit and/or trap were set only after one or more ports changed their state, so I don't understand how can the SM discover PSC bit set while all ports are down. Or is this a change in IS4? It can happen when SM runs on the switch, not not host. In this case if all ports are going down, SM will see them all down and it will see PSC bit on. So this patch is only for SM running on a switch which is the only node in the fabric? I don't see the race when there is more than one switch - please explain. Also AFAIK the PSC bit is set only after any physical port state change. So if we clear the PSC bit and only then get PortInfo we will still catch any new state change. right? Eli -- Yevgeny Eli -- Yevgeny -- Yevgeny Or perhaps the more serious problem happens when SM LID is not configured yet on the switch, hence the trap is not going to the right place? I have a patch for the 3.2 branch that I can merge into master. Sure, that would be nice :) -- Yevgeny Line Sasha Signed-off
Re: [PATCH] opensm/osm_state_mgr.c: force heavy sweep when fabric consists of single switch
Yevgeny Kliteynik wrote: Yevgeny Kliteynik wrote: Line Holen wrote: On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote: Line Holen wrote: On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote: Sasha Khapyorsky wrote: On 12:26 Tue 03 Nov , Yevgeny Kliteynik wrote: Always do heavy sweep when there is only one node in the fabric, and this node is a switch, and SM runs on top of it - there may be a race when OSM starts running before the external ports are ports are up, or if they went through reset while SM was starting. In this race switch brings up the ports and turns on the PSC bit, but OSM might get PortInfo before SwitchInfo, and it might see all ports as down, but PSC bit on. If that happens, OSM turns off PSC bit, and it will never see external ports again - it won't perform any heavy sweep, only light sweep Could such race happen when there are more than one node in a fabric? I think that my description of the race was misleading. The race can happen on *any* fabric when SM runs on switch. But when it does happen, SM thinks that the whole subnet is just one switch - that's what it managed to discover. I've actually seen it happening. So the patch fixes this particular case. So the next question that you would probably ask is can this race happen on some *other* switch and not the one SM is running on? Well, I don't know. I have a hunch that it can't, but I couldn't prove it to myself yet. The race on the managed switch is a special case because SM always sees port 0, and always gets responses to its SMP queries. On any other switch, if the ports were reset, SM won't get any response until the ports are up again. Perhaps there might be a case where SM got some port as down, and by the time SM got SwitchInfo with PSC bit the port was already up, so SM won't start discovery beyond this port. But this race would be fixed on the next heavy sweep, when SM will discover this port that it missed the previous time, whereas race on managed switch is fatal - SM won't ever do any heavy sweep. -- Yevgeny At least for the 3.2 branch there is a general race regardless of where the SM is running. I haven't checked the current master, but I cannot recall seeing any patches related to this so I assume the race is still there. There is a window between SM discovering a switch and clearing PSC for the same switch. The SM will not detect a state change on the switch ports during this time. If the port changes state during that period, the switch issues new trap 128, which (I think) should cause SM to re-discover the fabric once this discovery cycle is over. Is this correct? I think the switch shall send a trap whenever it sets the PSC bit. Once set I believe it will not send another trap until it is reset. Or do I misinterpret the spec ? I may be wrong, but I thought that this is how things work: - port state changes - switch turns on PSC bit and starts sending traps - SM gets the trap, sends trap repress - switch gets trap repress and stops sending traps - PSC is still on - port state changes again (the same or any other port) - switch turns on PSC bit (which doesn't matter as PSC is already on) and starts sending traps again - etc... Anyway, I'll double-check this issue. Yep, verified. Switch sends traps regardless the PSC bit status. Also, the spec doesn't link them together: o14-5.1.1: If a switch supports Traps (PortInfo: CapabilityMask.IsTrap-Supported is one), its SMA shall send trap 128 to the SM indicated by the PortInfo:MasterSMLID under any condition that would cause SwitchInfo:PortStateChange to be set to one. (See 14.2.5.4 SwitchInfo on page 827.) Trap will be sent according to the SMLID. After first bring up the SMLID is not set yet and trap will not be sent. In that case the opensm would discover the change only by PSC bit. For IS3 chips the PSC bit and/or trap were set only after one or more ports changed their state, so I don't understand how can the SM discover PSC bit set while all ports are down. Or is this a change in IS4? Eli -- Yevgeny -- Yevgeny Or perhaps the more serious problem happens when SM LID is not configured yet on the switch, hence the trap is not going to the right place? I have a patch for the 3.2 branch that I can merge into master. Sure, that would be nice :) -- Yevgeny Line Sasha Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il --- opensm/opensm/osm_state_mgr.c | 15 ++- 1 files changed, 10 insertions(+), 5 deletions(-) diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index 4303d6e..537c855 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm) * Otherwise, this is probably our first discovery pass * or we are connected in loopback. In both cases do a * heavy sweep. - * Note: If we are
Re: [PATCH] infiniband-diags/saquery: Report SA MAD Class specific status.
Sasha Khapyorsky wrote: On 10:09 Sun 01 Nov , Eli Dorfman (Voltaire) wrote: Report SA MAD Class specific status. Fixes wrong error report for SA query status. I agree with patch, but one comment is below. Signed-off-by: Eli Dorfman e...@voltaire.com --- infiniband-diags/src/saquery.c | 41 --- 1 files changed, 37 insertions(+), 4 deletions(-) diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c index 6c44b63..71823d5 100644 --- a/infiniband-diags/src/saquery.c +++ b/infiniband-diags/src/saquery.c @@ -124,6 +124,41 @@ int requested_lid_flag = 0; uint64_t requested_guid = 0; int requested_guid_flag = 0; +#define SA_ERR_UNKNOWN IB_SA_MAD_STATUS_PRIO_SUGGESTED + +const char *ib_sa_error_str[] = { +SA_NO_ERROR, +SA_ERR_NO_RESOURCES, +SA_ERR_REQ_INVALID, +SA_ERR_NO_RECORDS, +SA_ERR_TOO_MANY_RECORDS, +SA_ERR_REQ_INVALID_GID, +SA_ERR_REQ_INSUFFICIENT_COMPONENTS, +SA_ERR_REQ_DENIED, +SA_ERR_STATUS_PRIO_SUGGESTED, +SA_ERR_UNKNOWN +}; + +static inline const char *ib_sa_err_str(IN uint8_t status) +{ +if (status SA_ERR_UNKNOWN) +status = SA_ERR_UNKNOWN; +return (ib_sa_error_str[status]); +} + +static inline void report_err(int status) +{ +int st = status 0xff; + +if (st) +fprintf(stderr, ERROR: Query result returned: %s (0x%x)\n, +ib_get_err_str(st), status); +st = status 8; +if (st) +fprintf(stderr, ERROR: Query result returned: %s (0x%x)\n, +ib_sa_err_str(st), status); Such two identical messages with different error strings seems confusing to me. Wouldn't it be better to merge it in a single line, like: ERROR: Query result returned 0x: SM blah1 , SA blah2 (or similar), with making each part optional. I agree. Is it possible according to the spec to have both SM and SA (i don't think so) Eli Sasha +} + static int sa_query(struct bind_handle *h, uint8_t method, uint16_t attr, uint32_t mod, uint64_t comp_mask, uint64_t sm_key, void *data) @@ -794,8 +829,7 @@ static int get_any_records(bind_handle_t h, } if (result.status != IB_SUCCESS) { -fprintf(stderr, Query result returned: %s\n, -ib_get_err_str(result.status)); +report_err(result.status); return result.status; } @@ -1009,8 +1043,7 @@ static int get_print_class_port_info(bind_handle_t h) return ret; } if (result.status != IB_SUCCESS) { -fprintf(stderr, ERROR: Query result returned: %s\n, -ib_get_err_str(result.status)); +report_err(result.status); return (result.status); } dump_results(result, dump_class_port_info); -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] infiniband-diags/saquery: Report SA MAD Class specific status.
Report SA MAD Class specific status. In addition to SM status. Signed-off-by: Eli Dorfman e...@voltaire.com --- infiniband-diags/src/saquery.c | 45 --- 1 files changed, 41 insertions(+), 4 deletions(-) diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c index 6c44b63..9495cd9 100644 --- a/infiniband-diags/src/saquery.c +++ b/infiniband-diags/src/saquery.c @@ -124,6 +124,45 @@ int requested_lid_flag = 0; uint64_t requested_guid = 0; int requested_guid_flag = 0; +#define SA_ERR_UNKNOWN IB_SA_MAD_STATUS_PRIO_SUGGESTED + +const char *ib_sa_error_str[] = { + SA_NO_ERROR, + SA_ERR_NO_RESOURCES, + SA_ERR_REQ_INVALID, + SA_ERR_NO_RECORDS, + SA_ERR_TOO_MANY_RECORDS, + SA_ERR_REQ_INVALID_GID, + SA_ERR_REQ_INSUFFICIENT_COMPONENTS, + SA_ERR_REQ_DENIED, + SA_ERR_STATUS_PRIO_SUGGESTED, + SA_ERR_UNKNOWN +}; + +static inline const char *ib_sa_err_str(IN uint8_t status) +{ + if (status SA_ERR_UNKNOWN) + status = SA_ERR_UNKNOWN; + return (ib_sa_error_str[status]); +} + +static inline void report_err(int status) +{ + int st = status 0xff; + char sm_err_str[64] = { 0 }; + char sa_err_str[64] = { 0 }; + + if (st) + sprintf(sm_err_str, SM(%s), ib_get_err_str(st)); + + st = status 8; + if (st) + sprintf(sa_err_str, SA(%s), ib_sa_err_str(st)); + + fprintf(stderr, ERROR: Query result returned 0x%04x, %s%s\n, + status, sm_err_str, sa_err_str); +} + static int sa_query(struct bind_handle *h, uint8_t method, uint16_t attr, uint32_t mod, uint64_t comp_mask, uint64_t sm_key, void *data) @@ -794,8 +833,7 @@ static int get_any_records(bind_handle_t h, } if (result.status != IB_SUCCESS) { - fprintf(stderr, Query result returned: %s\n, - ib_get_err_str(result.status)); + report_err(result.status); return result.status; } @@ -1009,8 +1047,7 @@ static int get_print_class_port_info(bind_handle_t h) return ret; } if (result.status != IB_SUCCESS) { - fprintf(stderr, ERROR: Query result returned: %s\n, - ib_get_err_str(result.status)); + report_err(result.status); return (result.status); } dump_results(result, dump_class_port_info); -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] infiniband-diags/saquery: Report SA MAD Class specific status.
Report SA MAD Class specific status. Fixes wrong error report for SA query status. Signed-off-by: Eli Dorfman e...@voltaire.com --- infiniband-diags/src/saquery.c | 41 --- 1 files changed, 37 insertions(+), 4 deletions(-) diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c index 6c44b63..71823d5 100644 --- a/infiniband-diags/src/saquery.c +++ b/infiniband-diags/src/saquery.c @@ -124,6 +124,41 @@ int requested_lid_flag = 0; uint64_t requested_guid = 0; int requested_guid_flag = 0; +#define SA_ERR_UNKNOWN IB_SA_MAD_STATUS_PRIO_SUGGESTED + +const char *ib_sa_error_str[] = { + SA_NO_ERROR, + SA_ERR_NO_RESOURCES, + SA_ERR_REQ_INVALID, + SA_ERR_NO_RECORDS, + SA_ERR_TOO_MANY_RECORDS, + SA_ERR_REQ_INVALID_GID, + SA_ERR_REQ_INSUFFICIENT_COMPONENTS, + SA_ERR_REQ_DENIED, + SA_ERR_STATUS_PRIO_SUGGESTED, + SA_ERR_UNKNOWN +}; + +static inline const char *ib_sa_err_str(IN uint8_t status) +{ + if (status SA_ERR_UNKNOWN) + status = SA_ERR_UNKNOWN; + return (ib_sa_error_str[status]); +} + +static inline void report_err(int status) +{ + int st = status 0xff; + + if (st) + fprintf(stderr, ERROR: Query result returned: %s (0x%x)\n, + ib_get_err_str(st), status); + st = status 8; + if (st) + fprintf(stderr, ERROR: Query result returned: %s (0x%x)\n, + ib_sa_err_str(st), status); +} + static int sa_query(struct bind_handle *h, uint8_t method, uint16_t attr, uint32_t mod, uint64_t comp_mask, uint64_t sm_key, void *data) @@ -794,8 +829,7 @@ static int get_any_records(bind_handle_t h, } if (result.status != IB_SUCCESS) { - fprintf(stderr, Query result returned: %s\n, - ib_get_err_str(result.status)); + report_err(result.status); return result.status; } @@ -1009,8 +1043,7 @@ static int get_print_class_port_info(bind_handle_t h) return ret; } if (result.status != IB_SUCCESS) { - fprintf(stderr, ERROR: Query result returned: %s\n, - ib_get_err_str(result.status)); + report_err(result.status); return (result.status); } dump_results(result, dump_class_port_info); -- 1.5.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
hop weighted routing
Hi, I have tried to use hop weighted routing and noticed that it is supported only for minhop. Is there a reason that this was not implemented for up-down as well? Thanks, Eli -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Default force_link_speed value in opensm configuration
Hi, Is there a reason for setting default force_link_speed to 15 (set enabled as supported) and not to 0 (don't modify PortInfo:LinkSpeedEnabled )? Thanks, Eli -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH v2] infiniband-diags/scripts: Add 'ibcheckspeed' and 'ibcheckportspeed' to scripts
Hal Rosenstock wrote: On Mon, Sep 14, 2009 at 2:02 PM, Ira Weiny wei...@llnl.gov mailto:wei...@llnl.gov wrote: On Fri, 11 Sep 2009 09:32:39 +0530 Keshetti Mahesh keshetti.mah...@gmail.com mailto:keshetti.mah...@gmail.com wrote: My badness. I have not used 'iblinkinfo' before. So, I guess there is no need for the above script. Apart from that, I feel there should be a program/script which will first scan the fabric to find the maximum common supported width/speed and then report the warning messages of the links/ports which are configured with active width/speed less than the found value. Is there any tool already exists which does the same ? Not that I know of. ibportstate does this but is on a per port basis. This could be readily scripted (ad hoc or in tree) for this purpose. But it would be very slow for large fabrics. I think it would be better to add this option to iblinkinfo code. Also it would be useful to find all ports in Disable state. Eli -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH v2] infiniband-diags/scripts: Add 'ibcheckspeed' and 'ibcheckportspeed' to scripts
Keshetti Mahesh wrote: Also it would be useful to find all ports in Disable state. iblinkinfo already does this with '-d' option. It shows all the port that are in Down state - either cable disconnected or port disabled (by ibportstate). Eli -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html