I'm still getting warnings about OMPI_ENABLE_DYNAMIC_SL not being defined, even though I see this in my configure output:
checking if can use dynamic SL support... yes That's why I wanted that macro *always* defined by the m4 (to either 0 or 1) -- not just defining it or not. This is one of the OMPI coding guidelines: always define logical preprocessor macros to 0 or 1, not define-them-or-not. If you always define them, then you can get preprocessor warnings if you misspell a macro name. E.g.: #if MISSPELLED_MACRO_NAME will return a warning because the macro doesn't exist. But if you just define-the-macro-or-not and use #ifdef MISSPELLED_MACRO_NAME then you won't get a warning and it may be a difficult-to-find bug that you misspelled the macro name in the code. Also, note that the 4-argument version of AC_ARG_ENABLE isn't really necessary: AC_ARG_ENABLE([openib-dynamic-sl], [AC_HELP_STRING([--enable-openib-dynamic-sl], [Enable openib BTL to query Subnet Manager for IB SL (default: enabled)])], [enable_openib_dynamic_sl="$enableval"], [enable_openib_dynamic_sl="not_provided"]) You can shorten it to: AC_ARG_ENABLE([openib-dynamic-sl], [AC_HELP_STRING([--enable-openib-dynamic-sl], [Enable openib BTL to query Subnet Manager for IB SL (default: enabled)])]) because $enable_openib_dynamic_sl will automatically be set to "yes" (when --enable-openib-dynamic-sl is used), "no" (when --disable-openib-dynamic-sl is used), or "" (when neither is used). So there's no need to manually set this variable in args 3 and 4 -- it's set for you automatically. I know we have some of the older 4-arg forms in ompi_check_openib.m4, but they're just old and haven't been updated. You probably don't want to introduce *new* 4-arg usage. Finally, are you sure that infiniband/complib/cl_types_osd.h exists on all platforms? (e.g., Solaris) I know you said you don't have any Solaris machines to test with, but you should ping Oracle directly for some testing -- Terry might not be paying attention to this specific thread... ----- Here's the warnings I'm seeing -- did you remove the AC_DEFINE for OMPI_ENABLE_DYNAMIC_SL altogether by accident? [4:13] svbu-mpi:~/svn/ompi5/ompi/mca/btl/openib % make CC btl_openib.lo In file included from btl_openib_ini.h:16:0, from btl_openib.c:47: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC btl_openib_component.lo In file included from btl_openib_component.c:80:0: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC btl_openib_endpoint.lo In file included from btl_openib_endpoint.h:32:0, from btl_openib_endpoint.c:46: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC btl_openib_frag.lo In file included from btl_openib_frag.c:22:0: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC btl_openib_proc.lo In file included from btl_openib_proc.c:27:0: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC btl_openib_lex.lo btl_openib_lex.c: In function 'yy_get_next_buffer': btl_openib_lex.c:1229:3: warning: comparison between signed and unsigned integer expressions btl_openib_lex.l: At top level: btl_openib_lex.c:1323:17: warning: 'yyunput' defined but not used btl_openib_lex.c:1364:16: warning: 'input' defined but not used CC btl_openib_mca.lo In file included from btl_openib_mca.c:33:0: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined btl_openib_mca.c: In function 'btl_openib_register_mca_params': btl_openib_mca.c:401:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC btl_openib_ini.lo In file included from btl_openib_ini.c:35:0: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC btl_openib_async.lo In file included from btl_openib_async.c:26:0: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC btl_openib_xrc.lo In file included from btl_openib_xrc.h:14:0, from btl_openib_xrc.c:23: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC btl_openib_fd.lo CC btl_openib_ip.lo In file included from btl_openib_endpoint.h:32:0, from btl_openib_ip.c:30: btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC connect/btl_openib_connect_base.lo In file included from connect/btl_openib_connect_base.c:13:0: ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC connect/btl_openib_connect_oob.lo In file included from connect/btl_openib_connect_oob.c:41:0: ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:47:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:65:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:115:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c:271:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c: In function 'oob_component_finalize': connect/btl_openib_connect_oob.c:307:7: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c: In function 'qp_connect_all': connect/btl_openib_connect_oob.c:396:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined connect/btl_openib_connect_oob.c: At top level: connect/btl_openib_connect_oob.c:1011:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC connect/btl_openib_connect_empty.lo In file included from connect/btl_openib_connect_empty.c:13:0: ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC connect/btl_openib_connect_xoob.lo In file included from connect/btl_openib_connect_xoob.c:30:0: ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CC connect/btl_openib_connect_rdmacm.lo In file included from ./btl_openib_proc.h:26:0, from connect/btl_openib_connect_rdmacm.c:53: ./btl_openib.h:219:6: warning: "OMPI_ENABLE_DYNAMIC_SL" is not defined CCLD libmca_btl_openib.la [4:13] svbu-mpi:~/svn/ompi5/ompi/mca/btl/openib % cd On Jul 3, 2011, at 11:07 AM, Yevgeny Kliteynik wrote: > Hi Jeff, > > On 02-Jul-11 11:52 PM, Jeff Squyres (jsquyres) wrote: >> Were all the issueswith this code fixed? There were m4 issues and solaris >> issues, IIRC. > > I took all the fixes I could find based on the trac: > "Be sure also to look at r24196; Josh committed a > bunch of warning fixes for you after r24915" > > I also removed all the libibmad dependencies and unneded macros, > so I hope that this is OK. However, I don't have any Solaris machine > to try this to make sure that there are no issues. > > The only complaint w.r.t. Solaris that I could find was Terry's > mail from last week, but it turned out to be a different problem. > > Are there any other problems that I'm not aware of? > > -- YK > > >> Sent from my phone. No type good. >> >> On Jun 28, 2011, at 9:28 AM, "klit...@osl.iu.edu"<klit...@osl.iu.edu> wrote: >> >>> Author: kliteyn >>> Date: 2011-06-28 10:28:29 EDT (Tue, 28 Jun 2011) >>> New Revision: 24830 >>> URL: https://svn.open-mpi.org/trac/ompi/changeset/24830 >>> >>> Log: >>> Supporting dynamic SL (#2674) >>> >>> - Added enable/disable configuration parameter for dynamic SL >>> - All the dynamic SL code is conditionalized >>> - Removed libibmad dependency >>> - Using only one include - ib_types.h (part of opensm-devel package) >>> - Removed all the macro and data types definitions, using the >>> existing definitions from ib_types.h instead >>> - general cleaning here and there >>> >>> The async mode is not implemented yet - stay tuned... >>> >>> >>> Text files modified: >>> trunk/ompi/config/ompi_check_openib.m4 | 38 ++++ >>> trunk/ompi/mca/btl/openib/btl_openib.h | 5 >>> trunk/ompi/mca/btl/openib/btl_openib_mca.c | 10 >>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c | 309 >>> +++++++++++++++++---------------------- >>> 4 files changed, 182 insertions(+), 180 deletions(-) >>> >>> Modified: trunk/ompi/config/ompi_check_openib.m4 >>> ============================================================================== >>> --- trunk/ompi/config/ompi_check_openib.m4 (original) >>> +++ trunk/ompi/config/ompi_check_openib.m4 2011-06-28 10:28:29 EDT (Tue, >>> 28 Jun 2011) >>> @@ -155,11 +155,21 @@ >>> [$ompi_cv_func_ibv_create_cq_args], >>> [Number of arguments to >>> ibv_create_cq])])]) >>> >>> + # >>> + # OpenIB dynamic SL >>> + # >>> + AC_ARG_ENABLE([openib-dynamic-sl], >>> + [AC_HELP_STRING([--enable-openib-dynamic-sl], >>> + [Enable openib BTL to query Subnet Manager for IB >>> SL (default: enabled)])], >>> + [enable_openib_dynamic_sl="$enableval"], >>> + [enable_openib_dynamic_sl="yes"]) >>> + >>> # Set these up so that we can do an AC_DEFINE below >>> # (unconditionally) >>> $1_have_xrc=0 >>> $1_have_rdmacm=0 >>> $1_have_ibcm=0 >>> + $1_have_dynamic_sl=0 >>> >>> # If we have the openib stuff available, find out what we've got >>> AS_IF([test "$ompi_check_openib_happy" = "yes"], >>> @@ -176,6 +186,19 @@ >>> AC_CHECK_FUNCS([ibv_create_xrc_rcv_qp], [$1_have_xrc=1]) >>> fi >>> >>> + if test "$enable_openib_dynamic_sl" = "yes"; then >>> + # We need ib_types.h file, which is installed with >>> opensm-devel >>> + # package. However, ib_types.h has a bad include directive, >>> + # which will cause AC_CHECK_HEADER to fail. >>> + # So instead, we will look for another file that is also >>> + # installed as part of opensm-devel package and included in >>> + # ib_types.h, but it doesn't include any other IB-related >>> files. >>> + AC_CHECK_HEADER([infiniband/complib/cl_types_osd.h], >>> + [$1_have_dynamic_sl=1], >>> + [AC_MSG_ERROR([opensm-devel package not >>> found - please install it or disable dynamic SL support with >>> \"--disable-openib-dynamic-sl\"])], >>> + []) >>> + fi >>> + >>> # Do we have a recent enough RDMA CM? Need to have the >>> # rdma_get_peer_addr (inline) function (originally appeared >>> # in OFED v1.3). >>> @@ -244,6 +267,15 @@ >>> else >>> AC_MSG_RESULT([no]) >>> fi >>> + >>> + AC_MSG_CHECKING([if dynamic SL is enabled]) >>> + AC_DEFINE_UNQUOTED([OMPI_ENABLE_DYNAMIC_SL], [$$1_have_dynamic_sl], >>> + [Enable features required for dynamic SL support]) >>> + if test "1" = "$$1_have_dynamic_sl"; then >>> + AC_MSG_RESULT([yes]) >>> + else >>> + AC_MSG_RESULT([no]) >>> + fi >>> >>> AC_MSG_CHECKING([if OpenFabrics RDMACM support is enabled]) >>> AC_DEFINE_UNQUOTED([OMPI_HAVE_RDMACM], [$$1_have_rdmacm], >>> @@ -267,7 +299,11 @@ >>> AC_MSG_RESULT([no]) >>> fi >>> >>> - CPPFLAGS="$ompi_check_openib_$1_save_CPPFLAGS" >>> + AS_IF([test -z "$ompi_check_openib_dir"], >>> + [openib_include_dir="/usr/include"], >>> + [openib_include_dir="$ompi_check_openib_dir/include"]) >>> + >>> + CPPFLAGS="$ompi_check_openib_$1_save_CPPFLAGS >>> -I$openib_include_dir/infiniband" >>> LDFLAGS="$ompi_check_openib_$1_save_LDFLAGS" >>> LIBS="$ompi_check_openib_$1_save_LIBS" >>> >>> >>> Modified: trunk/ompi/mca/btl/openib/btl_openib.h >>> ============================================================================== >>> --- trunk/ompi/mca/btl/openib/btl_openib.h (original) >>> +++ trunk/ompi/mca/btl/openib/btl_openib.h 2011-06-28 10:28:29 EDT (Tue, >>> 28 Jun 2011) >>> @@ -52,6 +52,7 @@ >>> BEGIN_C_DECLS >>> >>> #define HAVE_XRC (1 == OMPI_HAVE_CONNECTX_XRC) >>> +#define ENABLE_DYNAMIC_SL (1 == OMPI_ENABLE_DYNAMIC_SL) >>> >>> #define MCA_BTL_IB_LEAVE_PINNED 1 >>> #define IB_DEFAULT_GID_PREFIX 0xfe80000000000000ll >>> @@ -215,7 +216,9 @@ >>> uint32_t ib_rnr_retry; >>> uint32_t ib_max_rdma_dst_ops; >>> uint32_t ib_service_level; >>> - uint32_t ib_path_rec_service_level; >>> +#if (ENABLE_DYNAMIC_SL) >>> + uint32_t ib_path_record_service_level; >>> +#endif >>> int32_t use_eager_rdma; >>> int32_t eager_rdma_threshold; /**< After this number of msg, use RDMA >>> for short messages, always */ >>> int32_t eager_rdma_num; >>> >>> Modified: trunk/ompi/mca/btl/openib/btl_openib_mca.c >>> ============================================================================== >>> --- trunk/ompi/mca/btl/openib/btl_openib_mca.c (original) >>> +++ trunk/ompi/mca/btl/openib/btl_openib_mca.c 2011-06-28 10:28:29 EDT >>> (Tue, 28 Jun 2011) >>> @@ -398,10 +398,14 @@ >>> } >>> mca_btl_openib_component.ib_service_level = (uint32_t) ival; >>> >>> - CHECK(reg_int("ib_path_rec_service_level", NULL, "Enable getting >>> InfiniBand service level from PathRecord " >>> - "(must be>= 0, 0 = disabled, positive = try to get the >>> service level from PathRecord)", >>> +#if (ENABLE_DYNAMIC_SL) >>> + CHECK(reg_int("ib_path_record_service_level", NULL, >>> + "Enable getting InfiniBand service level from PathRecord >>> " >>> + "(must be>= 0, 0 = disabled, positive = try to get the " >>> + "service level from PathRecord)", >>> 0,&ival, REGINT_GE_ZERO)); >>> - mca_btl_openib_component.ib_path_rec_service_level = (uint32_t) ival; >>> + mca_btl_openib_component.ib_path_record_service_level = (uint32_t) >>> ival; >>> +#endif >>> >>> CHECK(reg_int("use_eager_rdma", NULL, "Use RDMA for eager messages " >>> "(-1 = use device default, 0 = do not use eager RDMA, " >>> >>> Modified: trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c >>> ============================================================================== >>> --- trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c (original) >>> +++ trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c >>> 2011-06-28 10:28:29 EDT (Tue, 28 Jun 2011) >>> @@ -44,6 +44,10 @@ >>> #include "connect/connect.h" >>> #include "orte/util/show_help.h" >>> >>> +#if (ENABLE_DYNAMIC_SL) >>> +#include<infiniband/iba/ib_types.h> >>> +#endif >>> + >>> #ifdef HAVE_UNISTD_H >>> #include<unistd.h> >>> #endif >>> @@ -54,109 +58,17 @@ >>> ENDPOINT_CONNECT_ACK >>> } connect_message_type_t; >>> >>> -#ifndef __WINDOWS__ >>> -#define PACK_SUFFIX __attribute__((packed)) >>> -#else >>> -#define PACK_SUFFIX >>> -#endif >>> - >>> -#define SL_NOT_PRESENT 0x7F >>> +#define SL_NOT_PRESENT 0xFF >>> #define MAX_GET_SL_REC_RETRIES 20 >>> #define GET_SL_REC_RETRIES_TIMEOUT_MS 2000000 >>> >>> -#define IB_SA_QPN 1 >>> -#define IB_GLOBAL_QKEY 0x80010000UL >>> -#define IB_MGMT_BASE_VERSION 1 >>> -#define IB_MGMT_CLASS_SUBN_ADM 0x03 >>> -#define IB_MGMT_METHOD_GET 0x01 >>> -#define IB_SA_TID_GET_PATH_REC_0 0xCA000000UL >>> -#define IB_SA_TID_GET_PATH_REC_1 0xBEEF0000UL >>> -#define IB_PATH_REC_SL_MASK 0x000F >>> -#define IB_SA_ATTR_PATH_REC 0x35 >>> -#define IB_SA_PATH_REC_DLID (1<<4) >>> -#define IB_SA_PATH_REC_SLID (1<<5) >>> - >>> - >>> -#ifdef __WINDOWS__ >>> - #pragma pack(push) >>> - #pragma pack(1) >>> -#endif >>> - >>> -struct ib_mad_hdr { >>> - uint8_t base_version; >>> - uint8_t mgmt_class; >>> - uint8_t class_version; >>> - uint8_t method; >>> - uint16_t status; >>> - uint16_t class_spec; >>> - uint32_t tid[2]; >>> - uint16_t attr_id; >>> - uint16_t resv; >>> - uint32_t attr_mod; >>> -} PACK_SUFFIX; >>> - >>> -struct ib_rmpp_hdr { >>> - uint32_t raw[3]; >>> -} PACK_SUFFIX; >>> - >>> -struct ib_sa_hdr { >>> - uint32_t sm_key[2]; >>> - uint16_t reserved; >>> - uint16_t attrib_offset; >>> - uint32_t comp_mask[2]; >>> -} PACK_SUFFIX; >>> - >>> -typedef union _ib_gid { >>> - uint8_t raw[16]; >>> - struct _ib_gid_unicast { >>> - uint64_t prefix; >>> - uint64_t interface_id; >>> - } PACK_SUFFIX unicast; >>> - struct _ib_gid_multicast { >>> - uint8_t header[2]; >>> - uint8_t raw_group_id[14]; >>> - } PACK_SUFFIX multicast; >>> -} PACK_SUFFIX ib_gid_t; >>> - >>> -struct ib_path_record { >>> - uint64_t service_id; >>> - ib_gid_t dgit; >>> - ib_gid_t sgit; >>> - uint16_t dlid; >>> - uint16_t slid; >>> - uint32_t hop_flow_raw; >>> - uint8_t tclass; >>> - uint8_t num_path; >>> - uint16_t pkey; >>> - uint8_t reserved1; >>> - uint8_t qos_class_sl; >>> - uint8_t mtu; >>> - uint8_t rate; >>> - uint32_t preference__packet_lifetime__packet_lifetime_selector; >>> - uint32_t reserved2[35]; >>> -} PACK_SUFFIX; >>> - >>> -union ib_sa_data { >>> - struct ib_path_record path_record; >>> -} PACK_SUFFIX; >>> - >>> -struct ib_mad_sa { >>> - struct ib_mad_hdr mad_hdr; >>> - struct ib_rmpp_hdr rmpp_hdr; >>> - struct ib_sa_hdr sa_hdr; >>> - union ib_sa_data sa_data; >>> -} PACK_SUFFIX; >>> - >>> -#ifdef __WINDOWS__ >>> - #pragma pack(pop) >>> -#endif >>> - >>> +#if (ENABLE_DYNAMIC_SL) >>> static struct mca_btl_openib_sa_qp_cache { >>> /* There will be a MR with the one send and receive buffer together */ >>> /* The send buffer is first, the receive buffer is second */ >>> /* The receive buffer in a UD queue pair needs room for the 40 byte GRH >>> */ >>> /* The buffers are first in the structure for page alignment */ >>> - char send_recv_buffer[sizeof(struct ib_mad_sa) * 2 + 40]; >>> + char send_recv_buffer[MAD_BLOCK_SIZE * 2 + 40]; >>> struct mca_btl_openib_sa_qp_cache *next; >>> struct ibv_context *context; >>> char *device_name; >>> @@ -168,8 +80,9 @@ >>> struct ibv_pd *pd; >>> struct ibv_recv_wr rwr; >>> struct ibv_sge rsge; >>> - char sl_values[65536]; >>> + uint8_t sl_values[65536]; /* 64K */ >>> } *sa_qp_cache = 0; >>> +#endif >>> >>> static int oob_priority = 50; >>> static bool rml_recv_posted = false; >>> @@ -198,27 +111,31 @@ >>> static void rml_recv_cb(int status, orte_process_name_t* process_name, >>> opal_buffer_t* buffer, orte_rml_tag_t tag, >>> void* cbdata); >>> + >>> +#if (ENABLE_DYNAMIC_SL) >>> static int init_ud_qp(struct ibv_context *context_arg, >>> struct mca_btl_openib_sa_qp_cache *cache); >>> static void init_sa_mad(struct mca_btl_openib_sa_qp_cache *cache, >>> - struct ib_mad_sa *sag, >>> - struct ibv_send_wr *swr, >>> - struct ibv_sge *ssge, >>> - uint16_t lid, >>> - uint16_t rem_lid); >>> + ib_sa_mad_t *sa_mad, >>> + struct ibv_send_wr *swr, >>> + struct ibv_sge *ssge, >>> + uint16_t lid, >>> + uint16_t rem_lid); >>> static int get_pathrecord_info(struct mca_btl_openib_sa_qp_cache *cache, >>> - struct ib_mad_sa *sag, >>> - struct ib_mad_sa *sar, >>> - struct ibv_send_wr *swr, >>> - uint16_t lid, >>> - uint16_t rem_lid); >>> -static int init_device(struct ibv_context *context_arg, >>> - struct mca_btl_openib_sa_qp_cache *cache, >>> - uint32_t port_num); >>> -static int get_pathrecord_sl(struct ibv_context *context_arg, >>> - uint32_t port_num, >>> + ib_sa_mad_t *sa_mad, >>> + ib_sa_mad_t *sar, >>> + struct ibv_send_wr *swr, >>> uint16_t lid, >>> uint16_t rem_lid); >>> +static int init_device(struct ibv_context *context_arg, >>> + struct mca_btl_openib_sa_qp_cache *cache, >>> + uint32_t port_num); >>> +static int get_pathrecord_sl(struct ibv_context *context_arg, >>> + uint32_t port_num, >>> + uint16_t lid, >>> + uint16_t rem_lid); >>> +static void free_sa_qp_cache(void); >>> +#endif >>> >>> /* >>> * The "component" struct -- the top-level function pointers for the >>> @@ -351,6 +268,33 @@ >>> return OMPI_SUCCESS; >>> } >>> >>> +#if (ENABLE_DYNAMIC_SL) >>> +static void free_sa_qp_cache(void) >>> +{ >>> + struct mca_btl_openib_sa_qp_cache *cache, *tmp; >>> + >>> + cache = sa_qp_cache; >>> + while (NULL != cache) { >>> + /* free cache data */ >>> + if (cache->device_name) >>> + free(cache->device_name); >>> + if (NULL != cache->qp) >>> + ibv_destroy_qp(cache->qp); >>> + if (NULL != cache->ah) >>> + ibv_destroy_ah(cache->ah); >>> + if (NULL != cache->cq) >>> + ibv_destroy_cq(cache->cq); >>> + if (NULL != cache->mr) >>> + ibv_dereg_mr(cache->mr); >>> + if (NULL != cache->pd) >>> + ibv_dealloc_pd(cache->pd); >>> + tmp = cache->next; >>> + free(cache); >>> + cache = tmp; >>> + } >>> +} >>> +#endif >>> + >>> /* >>> * Component finalize function. Cleanup RML non-blocking receive. >>> */ >>> @@ -360,7 +304,9 @@ >>> orte_rml.recv_cancel(ORTE_NAME_WILDCARD, OMPI_RML_TAG_OPENIB); >>> rml_recv_posted = false; >>> } >>> - >>> + #if (ENABLE_DYNAMIC_SL) >>> + free_sa_qp_cache(); >>> +#endif >>> return OMPI_SUCCESS; >>> } >>> >>> @@ -425,7 +371,7 @@ >>> */ >>> static int qp_connect_all(mca_btl_openib_endpoint_t *endpoint) >>> { >>> - int i, rc; >>> + int i; >>> mca_btl_openib_module_t* openib_btl = >>> (mca_btl_openib_module_t*)endpoint->endpoint_btl; >>> >>> @@ -446,18 +392,24 @@ >>> attr.ah_attr.dlid = endpoint->rem_info.rem_lid; >>> attr.ah_attr.src_path_bits = openib_btl->src_path_bits; >>> attr.ah_attr.port_num = openib_btl->port_num; >>> - attr.ah_attr.sl = mca_btl_openib_component.ib_service_level; >>> - /* if user enable ib_path_rec_service_level - dynamically get the >>> sl from PathRecord */ >>> - if (mca_btl_openib_component.ib_path_rec_service_level> 0) { >>> - rc = get_pathrecord_sl(qp->context, >>> + >>> +#if (ENABLE_DYNAMIC_SL) >>> + /* if user enabled dynamic SL, get it from PathRecord */ >>> + if (0 != mca_btl_openib_component.ib_path_record_service_level) { >>> + int rc = get_pathrecord_sl(qp->context, >>> attr.ah_attr.port_num, >>> openib_btl->lid, >>> attr.ah_attr.dlid); >>> if (OMPI_ERROR == rc) { >>> + free_sa_qp_cache(); >>> return OMPI_ERROR; >>> } >>> attr.ah_attr.sl = rc; >>> } >>> +#else >>> + attr.ah_attr.sl = mca_btl_openib_component.ib_service_level; >>> +#endif >>> + >>> /* JMS to be filled in later dynamically */ >>> attr.ah_attr.static_rate = 0; >>> >>> @@ -1056,6 +1008,7 @@ >>> OPAL_THREAD_UNLOCK(&mca_btl_openib_component.ib_lock); >>> } >>> >>> +#if (ENABLE_DYNAMIC_SL) >>> static int init_ud_qp(struct ibv_context *context_arg, >>> struct mca_btl_openib_sa_qp_cache *cache) >>> { >>> @@ -1094,7 +1047,7 @@ >>> memset(&mattr, 0, sizeof(mattr)); >>> mattr.qp_state = IBV_QPS_INIT; >>> mattr.port_num = cache->port_num; >>> - mattr.qkey = IB_GLOBAL_QKEY; >>> + mattr.qkey = ntohl(IB_QP1_WELL_KNOWN_Q_KEY); >>> rc = ibv_modify_qp(cache->qp,&mattr, >>> IBV_QP_STATE | >>> IBV_QP_PKEY_INDEX | >>> @@ -1128,61 +1081,75 @@ >>> return OMPI_SUCCESS; >>> } >>> static void init_sa_mad(struct mca_btl_openib_sa_qp_cache *cache, >>> - struct ib_mad_sa *sag, >>> - struct ibv_send_wr *swr, >>> - struct ibv_sge *ssge, >>> - uint16_t lid, >>> - uint16_t rem_lid) >>> + ib_sa_mad_t *sa_mad, >>> + struct ibv_send_wr *swr, >>> + struct ibv_sge *ssge, >>> + uint16_t lid, >>> + uint16_t rem_lid) >>> { >>> - memset(sag, 0, sizeof(*sag)); >>> + ib_path_rec_t *path_record = (ib_path_rec_t*)sa_mad->data; >>> + >>> memset(swr, 0, sizeof(*swr)); >>> memset(ssge, 0, sizeof(*ssge)); >>> >>> - sag->mad_hdr.base_version = IB_MGMT_BASE_VERSION; >>> - sag->mad_hdr.mgmt_class = IB_MGMT_CLASS_SUBN_ADM; >>> - sag->mad_hdr.class_version = 2; >>> - sag->mad_hdr.method = IB_MGMT_METHOD_GET; >>> - sag->mad_hdr.attr_id = htons (IB_SA_ATTR_PATH_REC); >>> - sag->mad_hdr.tid[0] = IB_SA_TID_GET_PATH_REC_0 + cache->qp->qp_num; >>> - sag->mad_hdr.tid[1] = IB_SA_TID_GET_PATH_REC_1 + rem_lid; >>> - sag->sa_hdr.comp_mask[1] = >>> - htonl(IB_SA_PATH_REC_DLID | IB_SA_PATH_REC_SLID); >>> - sag->sa_data.path_record.dlid = htons(rem_lid); >>> - sag->sa_data.path_record.slid = htons(lid); >>> + /* Initialize the standard MAD header. */ >>> + memset(sa_mad, 0, MAD_BLOCK_SIZE); >>> + ib_mad_init_new((ib_mad_t *)sa_mad, /* mad header pointer */ >>> + IB_MCLASS_SUBN_ADM, /* management class */ >>> + (uint8_t) 2, /* version */ >>> + IB_MAD_METHOD_GET, /* method */ >>> + hton64((uint64_t)lid<< 48 | /* transaction ID */ >>> + (uint64_t)rem_lid<< 32 | >>> + (uint64_t)cache->qp->qp_num<< 8), >>> + IB_MAD_ATTR_PATH_RECORD, /* attribute ID */ >>> + 0); /* attribute modifier */ >>> + >>> + sa_mad->comp_mask = IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID; >>> + path_record->dlid = htons(rem_lid); >>> + path_record->slid = htons(lid); >>> >>> swr->sg_list = ssge; >>> swr->num_sge = 1; >>> swr->opcode = IBV_WR_SEND; >>> swr->wr.ud.ah = cache->ah; >>> - swr->wr.ud.remote_qpn = IB_SA_QPN; >>> - swr->wr.ud.remote_qkey = IB_GLOBAL_QKEY; >>> + swr->wr.ud.remote_qpn = ntohl(IB_QP1); >>> + swr->wr.ud.remote_qkey = ntohl(IB_QP1_WELL_KNOWN_Q_KEY); >>> swr->send_flags = IBV_SEND_SIGNALED | IBV_SEND_SOLICITED; >>> >>> - ssge->addr = (uint64_t)(void *)sag; >>> - ssge->length = sizeof(*sag); >>> + ssge->addr = (uint64_t)(void *)sa_mad; >>> + ssge->length = MAD_BLOCK_SIZE; >>> ssge->lkey = cache->mr->lkey; >>> } >>> >>> static int get_pathrecord_info(struct mca_btl_openib_sa_qp_cache *cache, >>> - struct ib_mad_sa *sag, >>> - struct ib_mad_sa *sar, >>> - struct ibv_send_wr *swr, >>> - uint16_t lid, >>> - uint16_t rem_lid) >>> + ib_sa_mad_t *req_mad, >>> + ib_sa_mad_t *resp_mad, >>> + struct ibv_send_wr *swr, >>> + uint16_t lid, >>> + uint16_t rem_lid) >>> { >>> struct ibv_send_wr *bswr; >>> struct ibv_wc wc; >>> struct timeval get_sl_rec_last_sent, get_sl_rec_last_poll; >>> struct ibv_recv_wr *brwr; >>> int got_sl_value, get_sl_rec_retries, rc, ne, i; >>> + ib_path_rec_t *req_path_record = ib_sa_mad_get_payload_ptr(req_mad); >>> + ib_path_rec_t *resp_path_record = ib_sa_mad_get_payload_ptr(resp_mad); >>> >>> got_sl_value = 0; >>> get_sl_rec_retries = 0; >>> >>> + rc = ibv_post_recv(cache->qp,&(cache->rwr),&brwr); >>> + if (0 != rc) { >>> + BTL_ERROR(("error posting receive on QP [0x%x] errno says: %s >>> [%d]", >>> + cache->qp->qp_num, strerror(errno), errno)); >>> + return OMPI_ERROR; >>> + } >>> + >>> while (0 == got_sl_value) { >>> rc = ibv_post_send(cache->qp, swr,&bswr); >>> if (0 != rc) { >>> - BTL_ERROR(("error posing send on QP[%x] errno says: %s [%d]", >>> + BTL_ERROR(("error posting send on QP [0x%x] errno says: %s >>> [%d]", >>> cache->qp->qp_num, strerror(errno), errno)); >>> return OMPI_ERROR; >>> } >>> @@ -1190,25 +1157,23 @@ >>> >>> while (0 == got_sl_value) { >>> ne = ibv_poll_cq(cache->cq, 1,&wc); >>> - if (ne> 0 >>> -&& wc.status == IBV_WC_SUCCESS >>> -&& wc.opcode == IBV_WC_RECV >>> -&& wc.byte_len>= sizeof(*sar) >>> -&& sar->mad_hdr.tid[0] == sag->mad_hdr.tid[0] >>> -&& sar->mad_hdr.tid[1] == sag->mad_hdr.tid[1]) { >>> - if (0 == sar->mad_hdr.status >>> -&& sar->sa_data.path_record.slid == htons(lid) >>> -&& sar->sa_data.path_record.dlid == htons(rem_lid)) { >>> + if (ne> 0&& >>> + IBV_WC_SUCCESS == wc.status&& >>> + IBV_WC_RECV == wc.opcode&& >>> + wc.byte_len>= MAD_BLOCK_SIZE&& >>> + resp_mad->trans_id == req_mad->trans_id) { >>> + if (0 == resp_mad->status&& >>> + req_path_record->slid == htons(lid)&& >>> + req_path_record->dlid == htons(rem_lid)) { >>> /* Everything matches, so we have the desired SL */ >>> - cache->sl_values[rem_lid] = >>> - sar->sa_data.path_record.qos_class_sl& >>> IB_PATH_REC_SL_MASK; >>> + cache->sl_values[rem_lid] = >>> ib_path_rec_sl(resp_path_record); >>> got_sl_value = 1; /* still must repost recieve buf */ >>> } else { >>> /* Probably bad status, unlikely bad lid match. We will >>> */ >>> /* ignore response and let it time out so that we do a >>> */ >>> /* retry, but after a delay. We must make a new TID so >>> */ >>> /* the SM doesn't see it as the same request. >>> */ >>> - sag->mad_hdr.tid[1] += 0x10000; >>> + req_mad->trans_id += hton64(1); >>> } >>> rc = ibv_post_recv(cache->qp,&(cache->rwr),&brwr); >>> if (0 != rc) { >>> @@ -1249,7 +1214,6 @@ >>> { >>> struct ibv_ah_attr aattr; >>> struct ibv_port_attr pattr; >>> - struct ibv_recv_wr *brwr; >>> int rc; >>> >>> cache->context = ibv_open_device(context_arg->device); >>> @@ -1315,16 +1279,10 @@ >>> cache->rwr.sg_list =&(cache->rsge); >>> memset(&(cache->rsge), 0, sizeof(cache->rsge)); >>> cache->rsge.addr = (uint64_t)(void *) >>> - (cache->send_recv_buffer + sizeof(struct ib_mad_sa)); >>> - cache->rsge.length = sizeof(struct ib_mad_sa) + 40; >>> + (cache->send_recv_buffer + MAD_BLOCK_SIZE); >>> + cache->rsge.length = MAD_BLOCK_SIZE + 40; >>> cache->rsge.lkey = cache->mr->lkey; >>> >>> - rc = ibv_post_recv(cache->qp,&(cache->rwr),&brwr); >>> - if (0 != rc) { >>> - BTL_ERROR(("error posing receive on QP[%x] errno says: %s [%d]", >>> - cache->qp->qp_num, strerror(errno), errno)); >>> - return OMPI_ERROR; >>> - } >>> return 0; >>> } >>> >>> @@ -1334,7 +1292,7 @@ >>> uint16_t rem_lid) >>> { >>> struct ibv_send_wr swr; >>> - struct ib_mad_sa *sag, *sar; >>> + ib_sa_mad_t *req_mad, *resp_mad; >>> struct ibv_sge ssge; >>> struct mca_btl_openib_sa_qp_cache *cache; >>> long page_size = sysconf(_SC_PAGESIZE); >>> @@ -1342,8 +1300,8 @@ >>> >>> /* search for a cached item */ >>> for (cache = sa_qp_cache; cache; cache = cache->next) { >>> - if (strcmp(cache->device_name, >>> - ibv_get_device_name(context_arg->device)) == 0 >>> + if (0 == strcmp(cache->device_name, >>> + ibv_get_device_name(context_arg->device)) >>> && cache->port_num == port_num) { >>> break; >>> } >>> @@ -1365,15 +1323,15 @@ >>> >>> /* if the destination lid SL value is not in the cache, go get it */ >>> if (SL_NOT_PRESENT == cache->sl_values[rem_lid]) { >>> - /* sag is first buffer, where we build the SA Get request to send >>> */ >>> - sag = (struct ib_mad_sa *)(cache->send_recv_buffer); >>> + /* sa_mad is first buffer, where we build the SA Get request to >>> send */ >>> + req_mad = (ib_sa_mad_t *)(cache->send_recv_buffer); >>> >>> - init_sa_mad(cache, sag,&swr,&ssge, lid, rem_lid); >>> + init_sa_mad(cache, req_mad,&swr,&ssge, lid, rem_lid); >>> >>> - /* sar is the receive buffer (40 byte GRH) */ >>> - sar = (struct ib_mad_sa *)(cache->send_recv_buffer + sizeof(struct >>> ib_mad_sa) + 40); >>> + /* resp_mad is the receive buffer (40 byte offset is for GRH) */ >>> + resp_mad = (ib_sa_mad_t *)(cache->send_recv_buffer + >>> MAD_BLOCK_SIZE + 40); >>> >>> - rc = get_pathrecord_info(cache, sag, sar,&swr, lid, rem_lid); >>> + rc = get_pathrecord_info(cache, req_mad, resp_mad,&swr, lid, >>> rem_lid); >>> if (0 != rc) { >>> return rc; >>> } >>> @@ -1382,3 +1340,4 @@ >>> /* now all we do is send back the value laying around */ >>> return cache->sl_values[rem_lid]; >>> } >>> +#endif >>> _______________________________________________ >>> svn-full mailing list >>> svn-f...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/