sure, lets discuss it on the next telecon in 1w (Mellanox IL is OOO for holidays and Josh is on vacation).
I think it is very good feature from enhancing OMPI usability point of view. See it as a programmable version of release notes, i.e. example: - In release notes vendors often specify that OpenMPI-SHMEM with PMI2 requires mxm 2.1, slurm 2.6.2+, libibverbs 2.2+, etc. - The user/site/sysadmin can compile OpenMPI-SHMEM package with libibverbs 2.1, mxm 1.5 and slurm 2.6.1 which is perfectly valid and will work w/o any issues, but not certified by vendor because of some known issues with this mix. - vendor can provide script (or site admin can write one based on site local certification) to check with help of ompi_info,oshmem_info the current setup version which was compiled with OMPI and get a warning and save hassle of running into well-known issues. I think (+know) that many production environments and OMPI users will be happy to have it. On Mon, Apr 14, 2014 at 6:07 PM, Ralph Castain <r...@open-mpi.org> wrote: > Perhaps this is something best discussed on the weekly telecon? I think > you are misunderstanding what I'm saying. I'm not heavily against it, but I > still don't see the value, and dislike making disruptive changes that span > the code base without first ensuring there is no other viable alternative. > > FWIW: Most libraries remain ABI compliant across major releases for > exactly the reasons you cite. We don't actually support building against > one library version and running against another for these very reasons - if > users do that, it is at their own risk. Your change won't resolve that > problem as ompi_info is just as likely to barf when confronted by that > situation - remember, in order to register the component, ompi_info has to > *load* it first. So any library incompatibility may well have already > caused a problem. > > > On Apr 14, 2014, at 7:59 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote: > > There is no correlation between built_with and running_with versions of > external libraries supported by OMPI. > > The next release of external library does not mean we should remove code > in ompi for all previous supported releases for the same library. > > vendor/site can certify slurm version 2.6.1 while latest is 2.6.6. > SLURM is not ABI compliant between releases, so site would like to know > what is active version vs. certified to issue an early warning. > > Why are you so against it? I don`t see any issue with printing ext lib > version number in the MCA description, something that can improve > sysadmin/user-experience. > > > > > On Mon, Apr 14, 2014 at 5:47 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> On Apr 14, 2014, at 7:34 AM, Mike Dubman <mi...@dev.mellanox.co.il> >> wrote: >> >> it is unrelated: >> >> 1. The OMPI can support and built with many different (or all) versions >> of external library (for example: libmxm or libslurm). >> >> >> Not true - we do indeed check the library version in all cases where it >> matters. For example, the case you cite as your true story could easily >> have been prevented by using OMPI_CHECK_PACKAGE to verify that the libmxm >> had the required function in it >> >> 2. The OMPI utility ompi_info can expose the currently available version >> of libmxm/libslurm. >> >> >> Yes - but what good does that do? Bottom line is that you shouldn't have >> built if that library version isn't supported >> >> >> 3. The vendor or end-user wants to certify specific version of libmxm or >> libslurm to be used in the customer environment. >> >> 4. The current way - put a note into libmxm/libslurm Relase Notes, which >> is not a guarantee that site user/sysadmin will pay attention in production >> environment. >> >> >> Again, that's the whole purpose of the configure logic. You are supposed >> to check the library to ensure it is compatible, not just blindly build and >> then make the user figure it out >> >> 5. The suggestion is to use #2 to write script by user or vendor which >> will match currently available versions with supported/certified and let >> admin/user know that there is a mismatch between running and supported >> version. >> >> >> Like I said, that's the developer's responsibility to get the configure >> logic correct - not the user's responsibility to figure it out >> after-the-fact. >> >> >> P.S. based on the true story :) >> >> >> >> On Mon, Apr 14, 2014 at 5:19 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> <let's be consistent and shift this to the devel list> >>> >>> I'm still confused - how is that helpful? How was the build allowed to >>> complete if the external library version isn't supported?? You should >>> either quietly not-build the affected component, or error out if the user >>> specifically requested that component be built. >>> >>> This sounds to me like you have a weakness in your configure logic, and >>> are trying to find a bandaid. Perhaps a better solution (that wouldn't >>> cause us to change every component in the code base) would be to just add >>> appropriate tests to your configure logic so you don't incorrectly build >>> against an unsupported library? >>> >>> >>> On Apr 14, 2014, at 7:12 AM, Mike Dubman <mi...@dev.mellanox.co.il> >>> wrote: >>> >>> The use-case I`m interested to expose through ompi_info/oshmem_info the >>> compiled-in versions of external libraries. >>> User/Vendor can write small script on top of ompi_info/oshmem_info to >>> check if running version are in par with supported matrix. >>> >>> >>> On Mon, Apr 14, 2014 at 5:06 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Guess I'm a little confused and trying to understand the issue, so >>>> let's consider a couple of cases: >>>> >>>> * If we are building against an unsupported version of an external >>>> library, that is supposed to be detected by the configure logic, yes? So >>>> you would output a nice error message at that time, and stop the build >>>> process. >>>> >>>> * If we were built against one version of an external library, and >>>> someone attempts to run us against a different version, you'd have to >>>> detect that somehow at runtime. I'm not sure how you could reliably do that >>>> as the problem is likely to manifest itself as an unresolved function >>>> (i.e., we use something that doesn't exist in the version being used) or a >>>> difference in a function signature. Either way, you'll either fail to load >>>> the library or crash. >>>> >>>> So I'm not sure what the added function pointer actually accomplishes. >>>> I suppose you could use it during ompi_info to display something about what >>>> version you linked against, but that won't help solve either of the above >>>> problems. >>>> >>>> Could you help explain a little further? >>>> >>>> Thanks >>>> Ralph >>>> >>>> >>>> On Apr 14, 2014, at 5:57 AM, Mike Dubman <mi...@dev.mellanox.co.il> >>>> wrote: >>>> >>>> +devel mailing list (for web mail archive) >>>> >>>> >>>> On Sat, Apr 12, 2014 at 9:04 PM, Mike Dubman >>>> <mi...@dev.mellanox.co.il>wrote: >>>> >>>>> >>>>> Hi, >>>>> >>>>> Could you please suggest if following is addressed in MCA architecture >>>>> or maybe it is something we should add: >>>>> >>>>> Current MCA API: >>>>> The new MCA component should provide descriptor >>>>> mca_base_component_2_0_0_t which specifies how to >>>>> init/open/close/query/enable every new component. >>>>> Also, the descriptor is used to specify version of MCA framework and >>>>> version of MCA component. >>>>> >>>>> What is missing: >>>>> Some MCA components are wrappers on top of external libraries, i.e. >>>>> >>>>> hwloc (libhwloc.so) >>>>> usnic (libusnic.so) >>>>> fca (libfca.so) >>>>> mxm (libmxm.so) >>>>> slurm (libslurn.so) >>>>> pbs >>>>> pmi >>>>> openib (libibverbs) >>>>> vader (knem, ...) >>>>> ... >>>>> >>>>> So, it would be very useful if MCA descriptor will have another >>>>> function pointer which return the version of external dependent library >>>>> (if >>>>> applicable). >>>>> The ompi_info and oshmem_info will print it if present and will allow >>>>> sysadmin to track vendor specific dependencies for OMPI (like: mxm >>>>> compiled >>>>> with libmxm 2.1, usnic with libusnic v1.0, ...) and warn users if compiled >>>>> versions do not match vendor recommended. >>>>> >>>>> Please suggest. >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> devel-core mailing list >>>> devel-c...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel-core mailing list >>>> devel-c...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core >>>> >>> >>> _______________________________________________ >>> devel-core mailing list >>> devel-c...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/04/14507.php >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/04/14508.php >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/04/14509.php >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/04/14510.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/04/14511.php >