Feedback from call last week:
-----------------------------

- make yoda work with all btls
- pragmas in MPI_Init/Finalize seem sketchy
- possible conflict between opal and oshmem ptmalloc?
- be careful of BTL move to OPAL

Jeff's questions:
-----------------

- All the CMake and .windows stuff can now disappear.

- Why 2 README-SHMEM files?  Seems like they should be part of the main README 
file.

- contrib/tau/readme: should be named README, and readable/non-verbatim text 
should be wrapped to 76 cols

- has the TAU patch been submitted upstream?

- The distr/ directory looks like Mellanox-specific release stuff.  Should that 
be in the community tree?  And if so, why isn't it under contrib/dist/, e.g., 
contrib/dist/mellanox?

- Has the knem patch been submitted upstream?

- If you need to maintain the knem patch in this tree, it should be under the 
oshmem directory, not the top level

- I would strongly discourage using "f77" anywhere in openshmem -- there have 
not been Fortran 77 compilers **in over 25 years**.  MPI has *never* been 
Fortran-77 compliant.  Use "fortran", and your wrapper compiler should be 
"shmemfort", not "shmemf77" (which parallels the move to "mpifort").

- Be aware that Nathan's change to the MCA param system is coming very, very 
soon.  You will likely need to update to match it.

- I see Mellanox/coverity-specific paths in trunk/Makefile.am.  Don't do that.  
IF you want to add a "cov" target, then put a proper configure check looking 
for coverity in the path, and make the "cov" target do something sane when 
coverity is not found in your path.

- configure.ac: We already AC_CHECK_SIZEOF(long); don't put in a 2nd one.

- what is the additional memset() for in ompi_free_list.c?  I see it also in a 
few places in the SM BTL, grdma mpool module, etc.  Why is this necessary?

*** This is probably my biggest problem: I really, really dislike the insertion 
of open shmem flags and functionality in the Open MPI layer.  There are very, 
Very, VERY few abstraction violations between OPAL, ORTE, and OMPI -- and we've 
worked very, very hard to keep it that way.  You're adding a TON of abstraction 
violations (i.e., making the MPI layer aware of the openshmem layer above it).

- btl_openib.c: you have

#if OSHMEM_ENABLED
#else

Temporarily ignoring the fact that this is an abstraction violation, why not 
use #if !OSHMEM_ENABLED?  Having an empty #if block seems just weird.

Later, don't use "#if OSHMEM_ENABLED == 0" -- use "#if !OSHMEM_ENABLED" (that's 
just an OMPI convention).

- This same block in btl_openib.c, though (and many others like it in the 
openib BTL) -- why are you removing that code?  Is an openshmem-compiled code 
base not usable as an MPI code base?

- What is this random permutation of qp's that you put in connect_oob.c?

- Do not use // style comments in C code.
  --> Actually this might warrant a larger discussion.  We officially force a 
C99-compliant compiler now, right?  So do we want to start allowing //-style 
comments?

- I see tabs used in btl_sm.c.  Spaces.  Not tabs.  4 space indents.  Always.

- btl_sm.c: when testing with ==, please put the constants on the left (this is 
defensive programming):

    if (BTL_SM_HDR_TYPE_GET_AS_SEND == frag->hdr->tag)

- btl_sm_component.c: what is the MCA param component_use_knem_value for?  The 
help message string is NULL -- please put in a real message.

- I can't tell what is an improvement for the KNEM support in the SM BTL vs. 
what is oshem specific.  If some stuff is just improvements to KNEM SM BTL 
support, that should come in separately and not be in #if blocks.

- Warning/Info messages should be output with opal_show_help(), not 
opal_output().

- The only changes to trunk/ompi/mca/sbgp/ibnet/Makefile.am are commented out 
things; they should not be added.

- Please explain the pragmas in the MPI API functions (I seem to recall you 
explained these last week, but I don't remember why any more).

- I don't understand the changes made to ompi_info.  Why wouldn't you create 
oshmem-info?  (like orte-info)  Or perhaps this is a good opprotunity to 
re-factor the ompi_info code so that its core can be down in opal, and 
top-level tools like opal-info, orte-info, ompi_info, and oshmem-info can just 
call some common subroutines.  

--> Nathan may have done this in his MCA param base revamp, BTW...  Please 
check with him.

- Why was trunk/opal/etc/openmpi-mca-params.conf modified?  It should not 
include any parameters -- users can put whatever they want in there.

- orte/etc/Makefile.am -- don't create a file from Makefile.am.  Have the file 
already created, and Makefile just installs/uninstalls it.  If you need to 
substitute paths in there, then use the same .in template system that the rest 
of the tree uses.

- orte/etc/Makefile.am: is this the Open MPI implementation of Open Shmem, or 
Mellanox ScaleableSHMEM?

- Is there an oshmemrun, similar to orterun/mpirun?

- How will this code be distributed?  Will openmpi tarballs include the 
openshmem code?  Or will there be separate openshmem tarballs that include all 
of Open MPI + the openshmem project?

- What happens when you autogen/configure/make now; does that include openshmem?

- Related by slightly different: How does one configure openshem vs. openmpi?  
It seems like they are two different things, now (since you change a bunch of 
BTL functionality based on whether openshmem is enabled or not).

- are there any openshmem tests that should be added to MTT?

SUMMARY

- This is much more invasive to the OMPI layer than expected:
  1. Some of the changes seem to be improvements to existing code in OPAL and 
OMPI.  Please just bring these in separately, outside of OSHMEM.  Such 
improvements benefit everyone.
  2. Other changes seem to augmentations for OSHMEM support.

- In general, it seems like Open Shmem was grafted *in to* the MPI layer, 
rather than *on top of* the MPI layer.  I have a large problem with this.  I'd 
rather see a *much* cleaner separation between the two.  I'm sure that some 
level of integration is inevitable -- but the sheer number of #if's all over 
the place is quite worrysome.  Can you separate these better?

- If it would help, perhaps the BTL's functionality should be extended to 
support what openshmem needs, rather than putting in a bunch of #if's in the 
BTLs that openshmem supports.  This might be significantly cleaner, and also 
provide better long-term support for openshmem.

- I don't understand the branding/configury between openshmem and OMPI.  It 
seems like if you're using openshmem, you should be using openshmem-branded 
tools (e.g., openshmemrun).  Just like we do with ORTE -- there are many 
ORTE-only applications out there that use 100% ORTE-branded tools, and never do 
anything with MPI.  The same should be true with openshmem.  Otherwise, it's 
very confusing for end users.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to