Great! We should probably try to figure out how the mtl layer can be
modified to expose those atomics. If possible this should be done before
the 1.9 branch to ensure the feature is available in the next release
series.

-Nathan

On Thu, Nov 06, 2014 at 05:15:30PM -0500, Joshua Ladd wrote:
>    MXM supports atomics.
> 
>    On Thursday, November 6, 2014, Nathan Hjelm <hje...@lanl.gov> wrote:
> 
>      I haven't look at that yet. Would be great to get the new osc component
>      working over both btls and mtls. I know portals supports atomics but I
>      don't know whether psm does.
> 
>      -Nathan
> 
>      On Thu, Nov 06, 2014 at 08:45:15PM +0200, Mike Dubman wrote:
>      >    btw, do you plan to add atomics API to MTL layer as well?
>      >    On Thu, Nov 6, 2014 at 5:23 PM, Nathan Hjelm <hje...@lanl.gov>
>      wrote:
>      >
>      >      At the moment I select the lowest latency BTL that can reach all
>      of the
>      >      ranks in the communicator used to create the window. I can add
>      code to
>      >      round-robin windows over the available BTLs on multi-rail
>      systems.
>      >
>      >      -Nathan
>      >      On Wed, Nov 05, 2014 at 06:38:25PM -0800, Paul Hargrove wrote:
>      >      >    All atomics must be done through not just "the same btl" but
>      the
>      >      same btl
>      >      >    MODULE,  since atomics from two IB HCAs, for instance, are
>      not
>      >      necessarily
>      >      >    coherent. So, how is the "best" one to be selected?
>      >      >
>      >      >    -Paul [Sent from my phone]
>      >      >
>      >      >    On Nov 5, 2014 7:15 AM, "Nathan Hjelm" <hje...@lanl.gov>
>      wrote:
>      >      >
>      >      >      In the new osc component I don't try to handle that case.
>      All
>      >      atomics
>      >      >      have to be done through the same btl (including atomics on
>      self).
>      >      I did
>      >      >      this because with the default setup of Gemini they can not
>      be
>      >      mixed. If
>      >      >      it is possible to mix them with other networks I would be
>      happy
>      >      to add
>      >      >      an atomic flag for that.
>      >      >
>      >      >      -Nathan
>      >      >
>      >      >      On Wed, Nov 05, 2014 at 03:41:58AM -0500, Joshua Ladd
>      wrote:
>      >      >      >    Quick question. Out of curiosity, how do you handle
>      the
>      >      (common)
>      >      >      case of
>      >      >      >    mixing network atomics with CPU atomics? Say for a
>      single
>      >      target
>      >      >      with two
>      >      >      >    initiators, one initiator is on host with the target,
>      so
>      >      goes
>      >      >      through the
>      >      >      >    SM BTL, and the other initiator is off host, so goes
>      through
>      >      the
>      >      >      network
>      >      >      >    BTL.
>      >      >      >
>      >      >      >    Josh
>      >      >      >    On Tue, Nov 4, 2014 at 6:46 PM, Nathan Hjelm
>      >      <hje...@lanl.gov>
>      >      >      wrote:
>      >      >      >
>      >      >      >      What: Completely revamp the BTL RDMA interface
>      (btl_put,
>      >      btl_get)
>      >      >      to
>      >      >      >      better match what is needed for MPI one-sided.
>      >      >      >
>      >      >      >      Why: I am preparing to push an enhanced MPI-3
>      one-sided
>      >      component
>      >      >      that
>      >      >      >      makes use of network rdma and atomic operations to
>      provide
>      >      a fast
>      >      >      truely
>      >      >      >      one-sided implementation. Before I can push this
>      component
>      >      I want
>      >      >      to
>      >      >      >      change the btl interface to:
>      >      >      >
>      >      >      >       - Provide access to network atomic operations. I
>      only
>      >      need add
>      >      >      and
>      >      >      >         cswap but the interface can be extended to any
>      number
>      >      of
>      >      >      operations.
>      >      >      >
>      >      >      >         The new interface provides three new functions:
>      >      btl_atomic_op,
>      >      >      >         btl_atomic_fop, and btl_atomic_cswap.
>      Additionally
>      >      there are
>      >      >      two new
>      >      >      >         btl_flags to indicate available atomic support:
>      >      >      >         MCA_BTL_FLAGS_ATOMIC_OPS, and
>      >      MCA_BTL_FLAGS_ATOMIC_FOPS. The
>      >      >      >         btl_atomics_flags field has been added to
>      indicate
>      >      which
>      >      >      atomic
>      >      >      >         operations are supported (see
>      >      mca_btl_base_atomic_op_t). At
>      >      >      this time
>      >      >      >         I only added support for 64-bit integer atomics
>      but I
>      >      am open
>      >      >      to
>      >      >      >         adding support for 32-bit as well.
>      >      >      >
>      >      >      >       - Provide an interface that will allow
>      simultaneous
>      >      put/get
>      >      >      operations
>      >      >      >         without extra calls into the btl. The current
>      interface
>      >      >      requires the
>      >      >      >         btl user to call prepare_src/prepare_dst before
>      every
>      >      rdma
>      >      >      >         operation. In some cases this is a complete
>      waste
>      >      (vader, sm
>      >      >      with
>      >      >      >         CMA, knem, or xpmem).
>      >      >      >
>      >      >      >         I seperated the registration of memory from the
>      segment
>      >      info.
>      >      >      More
>      >      >      >         information is provided below. The new put/get
>      >      functions have
>      >      >      the
>      >      >      >         following signatures:
>      >      >      >
>      >      >      >      typedef int (*mca_btl_base_module_put_fn_t) (struct
>      >      >      >      mca_btl_base_module_t *btl,
>      >      >      >          struct mca_btl_base_endpoint_t *endpoint, void
>      >      >      *local_address,
>      >      >      >          uint64_t remote_address, struct
>      >      >      mca_btl_base_registration_handle_t
>      >      >      >      *local_handle,
>      >      >      >          struct mca_btl_base_registration_handle_t
>      >      *remote_handle,
>      >      >      size_t
>      >      >      >      size, int flags,
>      >      >      >          int order, mca_btl_base_rdma_completion_fn_t
>      cbfunc,
>      >      void
>      >      >      >      *cbcontext, void *cbdata);
>      >      >      >
>      >      >      >      typedef int (*mca_btl_base_module_get_fn_t) (struct
>      >      >      >      mca_btl_base_module_t *btl,
>      >      >      >          struct mca_btl_base_endpoint_t *endpoint, void
>      >      >      *local_address,
>      >      >      >          uint64_t remote_address, struct
>      >      >      mca_btl_base_registration_handle_t
>      >      >      >      *local_handle,
>      >      >      >          struct mca_btl_base_registration_handle_t
>      >      *remote_handle,
>      >      >      size_t
>      >      >      >      size, int flags,
>      >      >      >          int order, mca_btl_base_rdma_completion_fn_t
>      cbfunc,
>      >      void
>      >      >      >      *cbcontext, void *cbdata);
>      >      >      >
>      >      >      >      typedef void (*mca_btl_base_rdma_completion_fn_t)(
>      >      >      >          struct mca_btl_base_module_t* module,
>      >      >      >          struct mca_btl_base_endpoint_t* endpoint,
>      >      >      >          void *local_address,
>      >      >      >          struct mca_btl_base_registration_handle_t
>      >      *local_handle,
>      >      >      >          void *context,
>      >      >      >          void *cbdata,
>      >      >      >          int status);
>      >      >      >
>      >      >      >         I may modify the completion function to provide
>      more
>      >      >      information on
>      >      >      >         the completed operation (size).
>      >      >      >
>      >      >      >       - Allow the registration of an entire region even
>      if the
>      >      region
>      >      >      can not
>      >      >      >         be modified with a single rdma operation. At
>      this time
>      >      >      prepare_src
>      >      >      >         and prepare_dst may modify the size and register
>      a
>      >      smaller
>      >      >      >         region. This will not work.
>      >      >      >
>      >      >      >         This is done in the new interface through the
>      new
>      >      >      btl_register_mem,
>      >      >      >         and btl_deregister_mem interfaces. The
>      btl_register_mem
>      >      >      interface
>      >      >      >         returns a registration handle of size
>      >      >      btl_registration_handle_size
>      >      >      >         that can be used as either the local_handle or
>      >      remote_handle
>      >      >      to any
>      >      >      >         rdma/atomic function. BTLs that do not provide
>      these
>      >      functions
>      >      >      do not
>      >      >      >         require registration for rdma/atomic operations.
>      >      >      >
>      >      >      >      typedef struct mca_btl_base_registration_handle_t
>      >      >      >      *(*mca_btl_base_module_register_mem_fn_t)(
>      >      >      >          struct mca_btl_base_module_t* btl, struct
>      >      >      mca_btl_base_endpoint_t
>      >      >      >      *endpoint, void *base,
>      >      >      >          size_t size, uint32_t flags);
>      >      >      >
>      >      >      >      typedef struct mca_btl_base_registration_handle_t
>      >      >      >      *(*mca_btl_base_module_register_mem_fn_t)(
>      >      >      >          struct mca_btl_base_module_t* btl, struct
>      >      >      mca_btl_base_endpoint_t
>      >      >      >      *endpoint, void *base,
>      >      >      >          size_t size, uint32_t flags);
>      >      >      >
>      >      >      >       - Expose the limitations of the put and get
>      operations so
>      >      the
>      >      >      caller
>      >      >      >         can make decisions before trying a get or put
>      >      operation. Two
>      >      >      >         examples: the Gemini interconnect has an
>      alignment
>      >      restriction
>      >      >      on
>      >      >      >         get, openib devices may have a limit on how
>      large a
>      >      single
>      >      >      get/put
>      >      >      >         operation can be. The current interface sort of
>      gives
>      >      the put
>      >      >      limit
>      >      >      >         but it is tied to the rdma pipeline protocol.
>      >      >      >
>      >      >      >         This is done in the new interface by providing
>      >      btl_get_limit,
>      >      >      >         btl_get_alignment, btl_put_limit, and
>      >      btl_put_alignment.
>      >      >      Operations
>      >      >      >         that violate these restrictions should return
>      >      >      OPAL_ERR_BAD_PARAM
>      >      >      >         (operation over limit) or OPAL_ERR_NOT_SUPPORTED
>      >      (operation
>      >      >      not
>      >      >      >         supported due to alignment restructions with
>      either the
>      >      source
>      >      >      or
>      >      >      >         destination buffer).
>      >      >      >
>      >      >      >      This is a big change and I do not expect everyone
>      to like
>      >      100% of
>      >      >      these
>      >      >      >      changes. I welcome any feedback people have.
>      >      >      >
>      >      >      >      When: Tuesday, Nov 17, 2015. This is during SC so
>      there
>      >      will be
>      >      >      time for
>      >      >      >      face-to-face discussion if anyone has any concerns
>      or
>      >      would like
>      >      >      to see
>      >      >      >      something changed.
>      >      >      >
>      >      >      >      The proposed new btl interface as well as updated
>      versions
>      >      of:
>      >      >      pml/ob1,
>      >      >      >      btl/openib, btl/self, btl/scif, btl/sm, btl/tcp,
>      btl/ugni,
>      >      and
>      >      >      btl/vader
>      >      >      >      can be found in my btlmod branch at:
>      >      >      >
>      >      >      >      https://github.com/hjelmn/ompi/tree/btlmod
>      >      >      >
>      >      >      >      Other btls (smcuda, and usnic) still need to be
>      updated to
>      >      >      provide the
>      >      >      >      new interface. Unmodified btl will not build.
>      >      >      >
>      >      >      >      If there are no objections I will push the btl
>      >      modifications into
>      >      >      the
>      >      >      >      master two weeks from today (Nov 17). Please take a
>      look
>      >      and let
>      >      >      me know
>      >      >      >      what you think.
>      >      >      >
>      >      >      >      _______________________________________________
>      >      >      >      devel mailing list
>      >      >      >      de...@open-mpi.org
>      >      >      >      Subscription:
>      >      http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      >      >      >      Link to this post:
>      >      >      >
>      >      http://www.open-mpi.org/community/lists/devel/2014/11/16193.php
>      >      >
>      >      >      > _______________________________________________
>      >      >      > devel mailing list
>      >      >      > de...@open-mpi.org
>      >      >      > Subscription:
>      >      http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      >      >      > Link to this post:
>      >      >     
>      http://www.open-mpi.org/community/lists/devel/2014/11/16195.php
>      >      >
>      >      >      _______________________________________________
>      >      >      devel mailing list
>      >      >      de...@open-mpi.org
>      >      >      Subscription:
>      http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      >      >      Link to this post:
>      >      >     
>      http://www.open-mpi.org/community/lists/devel/2014/11/16198.php
>      >
>      >      > _______________________________________________
>      >      > devel mailing list
>      >      > de...@open-mpi.org
>      >      > Subscription:
>      http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      >      > Link to this post:
>      >      http://www.open-mpi.org/community/lists/devel/2014/11/16224.php
>      >
>      >      _______________________________________________
>      >      devel mailing list
>      >      de...@open-mpi.org
>      >      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      >      Link to this post:
>      >      http://www.open-mpi.org/community/lists/devel/2014/11/16230.php
>      >
>      >    --
>      >    Kind Regards,
>      >    M.
> 
>      > _______________________________________________
>      > devel mailing list
>      > de...@open-mpi.org
>      > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>      > Link to this post:
>      http://www.open-mpi.org/community/lists/devel/2014/11/16240.php

> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16258.php

Attachment: pgpXCJmwRjoxR.pgp
Description: PGP signature

Reply via email to