On Sat, Mar 3, 2018 at 6:35 PM, Cabral, Matias A <matias.a.cab...@intel.com> wrote:
> Hi George, > > > > Thanks for the feedback, appreciated. Few questions/comments: > > > > > Regarding the tag with your proposal the OFI MTL will support a wider > range of tags than the OB1 PML, where we are limited to 16 bits. Just make > sure you correctly expose your tag limit via the MPI_TAG_UB. > > > > I will take a look at MPI_TAG_UB. > It is a predefined attribute and should be automatically set by the MPI layer using the pml_max_tag field of the selected PML. > > I personally would prefer a solution where we can alter the > distribution of bits between bits in the cid and tag at compile time. > > > > Sure, I can do this. What would you suggest for plan B? Fewer tag bits and > more cid ones? Numbers? > As I mentioned PML (OB1) only supports 16 bits tags (in fact 15 because negative tags are reserved for OPI internal usage). I do not recall any complaints about this limit. Targeting consistency across PMLs provide user-friendliness, thus a default of 16 bits for tag and then everything else for the cid might be a sensible choice. George. > >. We can also envision this selection to be driven by an MCA parameter, > but this might be too costly > > > > I did think about it. However, as you say, I’m not yet convinced it is > worth it: > > a) I will be soon reviewing synchronous send protocol. Not reviewed > thoroughly yet, but I’m quite sure I can reduce it to use 2 bits (maybe > just 1). Freeing 2 (or 3) more bits for cids or ranks. > > b) Most of the providers TODAY effectively support FI_REMOTE_CQ_DATA > and FI_DIRECTED_RECV (psm2, gni, verbs;ofi_rxm, sockets). This is just a > fallback for potential new ones. FI_DIRECTED_RECV is necessary to > discriminate the source at RX time when the source is not in the tag. > > c) I will include build_time_plan_B you just suggested ;) > > > > Thanks, again. > > > > _MAC > > > > *From:* devel [mailto:devel-boun...@lists.open-mpi.org] *On Behalf Of *George > Bosilca > *Sent:* Saturday, March 03, 2018 6:29 AM > *To:* Open MPI Developers <devel@lists.open-mpi.org> > *Subject:* Re: [OMPI devel] Default tag for OFI MTL > > > > Hi Matias, > > > > Relaxing the restriction on the number of ranks is definitively a good > thing. The cost will be reflected on the number of communicators and tags, > and we must be careful how we balance this. > > > > Assuming context_id is the communicator cid, with 10 bits you can only > support 1024. A little low, even lower than MVAPICH. The way we allocate > cid is very sparse, and with a limited number of possible cid, we might run > in troubles very quickly for the few applications that are using a large > number of communicators, and for the resilience support. Yet another reason > to revisit the cid allocation in the short term. > > > > Regarding the tag with your proposal the OFI MTL will support a wider > range of tags than the OB1 PML, where we are limited to 16 bits. Just make > sure you correctly expose your tag limit via the MPI_TAG_UB. > > > > I personally would prefer a solution where we can alter the distribution > of bits between bits in the cid and tag at compile time. We can also > envision this selection to be driven by an MCA parameter, but this might be > too costly. > > George. > > > > > > > > > > On Sat, Mar 3, 2018 at 2:56 AM, Cabral, Matias A < > matias.a.cab...@intel.com> wrote: > > Hi all, > > > > I’m working on extending the OFI MTL to support FI_REMOTE_CQ_DATA (1) to > extend the number of ranks currently supported by the MTL. Currently > limited to only 16 bits included in the OFI tag (2). After the feature is > implemented there will be no limitation for providers that support > FI_REMOTE_CQ_DATA and FI_DIRECTED_RECEIVE (3). However, there will be a > fallback mode for providers that do not support these features and I would > like to get consensus on the default tag distribution. This is my proposal: > > > > * Default: No FI_REMOTE_CQ_DATA > > * 01234567 01| 234567 01234567 0123| 4567 |01234567 01234567 01234567 > 01234567 > > * context_id | source rank |proto| message > tag > > > > #define MTL_OFI_CONTEXT_MASK (0xFFC0000000000000ULL) > > #define MTL_OFI_SOURCE_MASK (0x003FFFF0000000000ULL) > > #define MTL_OFI_SOURCE_BITS_COUNT (18) /* 262,143 ranks */ > > #define MTL_OFI_CONTEXT_BITS_COUNT (10) /* 1,023 communicators */ > > #define MTL_OFI_TAG_BITS_COUNT (32) /* no restrictions */ > > #define MTL_OFI_PROTO_BITS_COUNT (4) > > > > Notes: > > - More ranks and fewer context ids than the current > implementation. > > - Moved the protocol bits from the most significant bits because > some providers may reserve starting from there (see mem_tag_format (4)) and > sync send will not work. > > > > Thoughts? > > > > Today we had a call with Howard (LANL), John and Hamuri (HPE) and briefly > talked about this, and also thought about sending this email as a query to > find other developers keeping an eye on OFI support in OMPI. > > > > Thanks, > > _MAC > > > > > > (1) https://ofiwg.github.io/libfabric/master/man/fi_cq.3.html > > (2) https://github.com/open-mpi/ompi/blob/master/ompi/mca/mtl/of > i/mtl_ofi_types.h#L70 > > (3) https://ofiwg.github.io/libfabric/master/man/fi_getinfo.3.html > > (4) https://ofiwg.github.io/libfabric/master/man/fi_endpoint.3.html > > > > > > > > > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel > > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel >
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel