+1. This is very helpful info to have.

Best,
Pavel (Pasha) Shamis

On Apr 14, 2014, at 2:57 PM, Mike Dubman 
<mi...@dev.mellanox.co.il<mailto:mi...@dev.mellanox.co.il>> wrote:

sure, lets discuss it on the next telecon in 1w (Mellanox IL is OOO for 
holidays and Josh is on vacation).

I think it is very good feature from enhancing OMPI usability point of view.

See it as a programmable version of release notes, i.e.

example:

- In release notes vendors often specify that OpenMPI-SHMEM with PMI2 requires 
mxm 2.1, slurm 2.6.2+, libibverbs 2.2+, etc.
- The user/site/sysadmin can compile OpenMPI-SHMEM package with libibverbs 2.1, 
mxm 1.5 and slurm 2.6.1 which is perfectly valid and will work w/o any issues, 
but not certified by vendor because of some known issues with this mix.

- vendor can provide script (or site admin can write one based on site local 
certification) to check with help of ompi_info,oshmem_info the current setup 
version which was compiled with OMPI and get a warning and save hassle of 
running into well-known issues.

I think (+know) that many production environments and OMPI users will be happy 
to have it.




On Mon, Apr 14, 2014 at 6:07 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:
Perhaps this is something best discussed on the weekly telecon? I think you are 
misunderstanding what I'm saying. I'm not heavily against it, but I still don't 
see the value, and dislike making disruptive changes that span the code base 
without first ensuring there is no other viable alternative.

FWIW: Most libraries remain ABI compliant across major releases for exactly the 
reasons you cite. We don't actually support building against one library 
version and running against another for these very reasons - if users do that, 
it is at their own risk. Your change won't resolve that problem as ompi_info is 
just as likely to barf when confronted by that situation - remember, in order 
to register the component, ompi_info has to *load* it first. So any library 
incompatibility may well have already caused a problem.


On Apr 14, 2014, at 7:59 AM, Mike Dubman 
<mi...@dev.mellanox.co.il<mailto:mi...@dev.mellanox.co.il>> wrote:

There is no correlation between built_with and running_with versions of 
external libraries supported by OMPI.

The next release of external library does not mean we should remove code in 
ompi for all previous supported releases for the same library.

vendor/site can certify slurm version 2.6.1 while latest is 2.6.6.
SLURM is not ABI compliant between releases, so site would like to know what is 
active version vs. certified to issue an early warning.

Why are you so against it? I don`t see any issue with printing ext lib version 
number in the MCA description, something that can improve 
sysadmin/user-experience.




On Mon, Apr 14, 2014 at 5:47 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:

On Apr 14, 2014, at 7:34 AM, Mike Dubman 
<mi...@dev.mellanox.co.il<mailto:mi...@dev.mellanox.co.il>> wrote:

it is unrelated:

1. The OMPI can support and built with many different (or all) versions of 
external library (for example: libmxm or libslurm).

Not true - we do indeed check the library version in all cases where it 
matters. For example, the case you cite as your true story could easily have 
been prevented by using OMPI_CHECK_PACKAGE to verify that the libmxm had the 
required function in it

2. The OMPI utility ompi_info can expose the currently available version of 
libmxm/libslurm.

Yes - but what good does that do? Bottom line is that you shouldn't have built 
if that library version isn't supported


3. The vendor or end-user wants to certify specific version of libmxm or 
libslurm to be used in the customer environment.
4. The current way - put a note into libmxm/libslurm Relase Notes, which is not 
a guarantee that site user/sysadmin will pay attention in production 
environment.

Again, that's the whole purpose of the configure logic. You are supposed to 
check the library to ensure it is compatible, not just blindly build and then 
make the user figure it out

5. The suggestion is to use #2 to write script by user or vendor which will 
match currently available versions with supported/certified and let admin/user 
know that there is a mismatch between running and supported version.

Like I said, that's the developer's responsibility to get the configure logic 
correct - not the user's responsibility to figure it out after-the-fact.


P.S. based on the true story :)



On Mon, Apr 14, 2014 at 5:19 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:
<let's be consistent and shift this to the devel list>

I'm still confused - how is that helpful? How was the build allowed to complete 
if the external library version isn't supported?? You should either quietly 
not-build the affected component, or error out if the user specifically 
requested that component be built.

This sounds to me like you have a weakness in your configure logic, and are 
trying to find a bandaid. Perhaps a better solution (that wouldn't cause us to 
change every component in the code base) would be to just add appropriate tests 
to your configure logic so you don't incorrectly build against an unsupported 
library?


On Apr 14, 2014, at 7:12 AM, Mike Dubman 
<mi...@dev.mellanox.co.il<mailto:mi...@dev.mellanox.co.il>> wrote:

The use-case I`m interested to expose through ompi_info/oshmem_info the 
compiled-in versions of external libraries.
User/Vendor can write small script on top of ompi_info/oshmem_info to check if 
running version are in par with supported matrix.


On Mon, Apr 14, 2014 at 5:06 PM, Ralph Castain 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> wrote:
Guess I'm a little confused and trying to understand the issue, so let's 
consider a couple of cases:

* If we are building against an unsupported version of an external library, 
that is supposed to be detected by the configure logic, yes?  So you would 
output a nice error message at that time, and stop the build process.

* If we were built against one version of an external library, and someone 
attempts to run us against a different version, you'd have to detect that 
somehow at runtime. I'm not sure how you could reliably do that as the problem 
is likely to manifest itself as an unresolved function (i.e., we use something 
that doesn't exist in the version being used) or a difference in a function 
signature. Either way, you'll either fail to load the library or crash.

So I'm not sure what the added function pointer actually accomplishes. I 
suppose you could use it during ompi_info to display something about what 
version you linked against, but that won't help solve either of the above 
problems.

Could you help explain a little further?

Thanks
Ralph


On Apr 14, 2014, at 5:57 AM, Mike Dubman 
<mi...@dev.mellanox.co.il<mailto:mi...@dev.mellanox.co.il>> wrote:

+devel mailing list (for web mail archive)


On Sat, Apr 12, 2014 at 9:04 PM, Mike Dubman 
<mi...@dev.mellanox.co.il<mailto:mi...@dev.mellanox.co.il>> wrote:

Hi,

Could you please suggest if following is addressed in MCA architecture or maybe 
it is something we should add:

Current MCA API:
The new MCA component should provide descriptor mca_base_component_2_0_0_t 
which specifies how to init/open/close/query/enable every new component.
Also, the descriptor is used to specify version of MCA framework and version of 
MCA component.

What is missing:
Some MCA components are wrappers on top of external libraries, i.e.

hwloc (libhwloc.so)
usnic (libusnic.so)
fca (libfca.so)
mxm (libmxm.so)
slurm (libslurn.so)
pbs
pmi
openib (libibverbs)
vader (knem, ...)
...

So, it would be very useful if MCA descriptor will have another function 
pointer which return the version of external dependent library (if applicable).
The ompi_info and oshmem_info will print it if present and will allow sysadmin 
to track vendor specific dependencies for OMPI (like: mxm compiled with libmxm 
2.1, usnic with libusnic v1.0, ...) and warn users if compiled versions do not 
match vendor recommended.

Please suggest.

Thanks






_______________________________________________
devel-core mailing list
devel-c...@open-mpi.org<mailto:devel-c...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core


_______________________________________________
devel-core mailing list
devel-c...@open-mpi.org<mailto:devel-c...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core

_______________________________________________
devel-core mailing list
devel-c...@open-mpi.org<mailto:devel-c...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core


_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/04/14507.php

_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/04/14508.php


_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/04/14509.php

_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/04/14510.php


_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/04/14511.php

_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/04/14515.php

Reply via email to