In the new osc component I don't try to handle that case. All atomics have to be done through the same btl (including atomics on self). I did this because with the default setup of Gemini they can not be mixed. If it is possible to mix them with other networks I would be happy to add an atomic flag for that.
-Nathan On Wed, Nov 05, 2014 at 03:41:58AM -0500, Joshua Ladd wrote: > Quick question. Out of curiosity, how do you handle the (common) case of > mixing network atomics with CPU atomics? Say for a single target with two > initiators, one initiator is on host with the target, so goes through the > SM BTL, and the other initiator is off host, so goes through the network > BTL. > > Josh > On Tue, Nov 4, 2014 at 6:46 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > What: Completely revamp the BTL RDMA interface (btl_put, btl_get) to > better match what is needed for MPI one-sided. > > Why: I am preparing to push an enhanced MPI-3 one-sided component that > makes use of network rdma and atomic operations to provide a fast truely > one-sided implementation. Before I can push this component I want to > change the btl interface to: > > - Provide access to network atomic operations. I only need add and > cswap but the interface can be extended to any number of operations. > > The new interface provides three new functions: btl_atomic_op, > btl_atomic_fop, and btl_atomic_cswap. Additionally there are two new > btl_flags to indicate available atomic support: > MCA_BTL_FLAGS_ATOMIC_OPS, and MCA_BTL_FLAGS_ATOMIC_FOPS. The > btl_atomics_flags field has been added to indicate which atomic > operations are supported (see mca_btl_base_atomic_op_t). At this time > I only added support for 64-bit integer atomics but I am open to > adding support for 32-bit as well. > > - Provide an interface that will allow simultaneous put/get operations > without extra calls into the btl. The current interface requires the > btl user to call prepare_src/prepare_dst before every rdma > operation. In some cases this is a complete waste (vader, sm with > CMA, knem, or xpmem). > > I seperated the registration of memory from the segment info. More > information is provided below. The new put/get functions have the > following signatures: > > typedef int (*mca_btl_base_module_put_fn_t) (struct > mca_btl_base_module_t *btl, > struct mca_btl_base_endpoint_t *endpoint, void *local_address, > uint64_t remote_address, struct mca_btl_base_registration_handle_t > *local_handle, > struct mca_btl_base_registration_handle_t *remote_handle, size_t > size, int flags, > int order, mca_btl_base_rdma_completion_fn_t cbfunc, void > *cbcontext, void *cbdata); > > typedef int (*mca_btl_base_module_get_fn_t) (struct > mca_btl_base_module_t *btl, > struct mca_btl_base_endpoint_t *endpoint, void *local_address, > uint64_t remote_address, struct mca_btl_base_registration_handle_t > *local_handle, > struct mca_btl_base_registration_handle_t *remote_handle, size_t > size, int flags, > int order, mca_btl_base_rdma_completion_fn_t cbfunc, void > *cbcontext, void *cbdata); > > typedef void (*mca_btl_base_rdma_completion_fn_t)( > struct mca_btl_base_module_t* module, > struct mca_btl_base_endpoint_t* endpoint, > void *local_address, > struct mca_btl_base_registration_handle_t *local_handle, > void *context, > void *cbdata, > int status); > > I may modify the completion function to provide more information on > the completed operation (size). > > - Allow the registration of an entire region even if the region can not > be modified with a single rdma operation. At this time prepare_src > and prepare_dst may modify the size and register a smaller > region. This will not work. > > This is done in the new interface through the new btl_register_mem, > and btl_deregister_mem interfaces. The btl_register_mem interface > returns a registration handle of size btl_registration_handle_size > that can be used as either the local_handle or remote_handle to any > rdma/atomic function. BTLs that do not provide these functions do not > require registration for rdma/atomic operations. > > typedef struct mca_btl_base_registration_handle_t > *(*mca_btl_base_module_register_mem_fn_t)( > struct mca_btl_base_module_t* btl, struct mca_btl_base_endpoint_t > *endpoint, void *base, > size_t size, uint32_t flags); > > typedef struct mca_btl_base_registration_handle_t > *(*mca_btl_base_module_register_mem_fn_t)( > struct mca_btl_base_module_t* btl, struct mca_btl_base_endpoint_t > *endpoint, void *base, > size_t size, uint32_t flags); > > - Expose the limitations of the put and get operations so the caller > can make decisions before trying a get or put operation. Two > examples: the Gemini interconnect has an alignment restriction on > get, openib devices may have a limit on how large a single get/put > operation can be. The current interface sort of gives the put limit > but it is tied to the rdma pipeline protocol. > > This is done in the new interface by providing btl_get_limit, > btl_get_alignment, btl_put_limit, and btl_put_alignment. Operations > that violate these restrictions should return OPAL_ERR_BAD_PARAM > (operation over limit) or OPAL_ERR_NOT_SUPPORTED (operation not > supported due to alignment restructions with either the source or > destination buffer). > > This is a big change and I do not expect everyone to like 100% of these > changes. I welcome any feedback people have. > > When: Tuesday, Nov 17, 2015. This is during SC so there will be time for > face-to-face discussion if anyone has any concerns or would like to see > something changed. > > The proposed new btl interface as well as updated versions of: pml/ob1, > btl/openib, btl/self, btl/scif, btl/sm, btl/tcp, btl/ugni, and btl/vader > can be found in my btlmod branch at: > > https://github.com/hjelmn/ompi/tree/btlmod > > Other btls (smcuda, and usnic) still need to be updated to provide the > new interface. Unmodified btl will not build. > > If there are no objections I will push the btl modifications into the > master two weeks from today (Nov 17). Please take a look and let me know > what you think. > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/11/16193.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/11/16195.php
pgpKW0ciyolbq.pgp
Description: PGP signature