Brian and I chatted about this on the phone today. Conclusions that
we came to:
1. We need to add a few lines of code to ensure that the MCA base
refuses to open components that have a different MCA version number
(i.e., dlopen a DSO, dlsym to get the component struct, check the
version number, if it's not the same MCA major.minor as our MCA
major.minor, dlclose it). This is easy to do; I'll add it to the hg.
2. Let's set the precedent now that changing the MCA version does
*not* force a change of all the framework version numbers. The
framework version numbers refer to their interfaces. Rather, it's a
triple of (MCA,framework,component) version numbers that uniquely
identify a component.
3. The load-time issues of mixing multiple MCA versions are solved by
points #1 and #2.
4. Leave the bump of all framework versions to 2.0 in place because a
good number of them had to be bumped anyway. We're probably bumping a
few that didn't actually need to be bumped (i.e., those that didn't
actually change since the v1.2 series), but what the heck -- most of
them have changed, and it's a bunch of work to roll all that out. So
let's just bump them, but not because we bumped the MCA version
number; rather, we bump them because we knew that most of them needed
to be bumped, but were too lazy to check and see exactly which ones
needed it (hey, let's be honest here...).
If no one has any objections to this, I'll bring this stuff into the
trunk at the original timeout -- Friday COB (i.e., tomorrow).
On Jul 21, 2008, at 8:55 PM, Jeff Squyres wrote:
On Jul 21, 2008, at 6:57 PM, Brian W. Barrett wrote:
I guess I don't understand. I thought there were three versions in
every
component -- the MCA version, the framework version, and the
component
version. The first two should determine if the component can
safely be
loaded and the third is to identify the component. I agree that
for this
change (an MCA-level change), the MCA version *should* change.
However,
the framework interface didn't change (well, not as a result of this
change), meaning that the framework version *should not* change.
The MCA
load infrastructure should see that the MCA versions don't match,
and not
load the component.
Josh and I wrestled with this question for a bit and probably fell
down on the side of conservatism; that's where this came from.
There were two reasons why we went this way:
1. You could (for example) have a coll framework v1.2.3 component
built with MCA v1.0.0 and the same coll framework v1.2.3 component
built against MCA v2.0.0, and they would be different. Worse, they
won't be "equal". Specifically, MCA 2.0.0 supports some minor
features that v1.0.0 doesn't -- so even though you have 2 of the
"same" component, they're not really the same. (*more on this below)
2. Another issue seemed pretty icky to solve, which led us to fall
down a little heavier on the side of bumping all the framework
version numbers. Let's say you have some Foo framework DSOs, some
of which are MCA v1.0.0 and some of which are v2.0.0. The Foo
framework interface is the same between the two. The MCA base can
find/open all of them easily enough; but how do we return all the
components to the caller? I could think of 3 ways:
A. return multiple lists to the caller: a list of each of v1.0.0
and v2.0.0 components. This means that every framework will need to
handle (or be able to reject or specify to the MCA base to reject
before even accepting as available) both MCA v1.0.0 and v2.0.0
components.
B. return a single list to the caller with both MCA component
versions in the list. Pretty much the same as #1, but it scales
better if we get in the business of changing the MCA version a lot
(please God no); I mention it mainly for completeness.
C. return a single list to the caller with all components
"upgraded" to MCA v2.0. This seems like a nice solution -- a la the
experiment we tried with coll a long time ago to prove to ourselves
that run-time versioning could work (for those of you who don't
remember: we had some coll v1.0.0 and some v1.1.0 components; the
coll base transparently handled everything at run-time). However,
there's a problem with this idea: since all frameworks use the
component struct as a "super" for their component structs, the MCA
base does not know the total size of the component public struct.
So it cannot "upgrade" the MCA v1.0.0 structure in memory to a
v2.0.0, because the v2.0.0 struct is bigger than the v1.0.0 struct.
So we can't just magically treat everything as v2.0.0 components at
the MCA base level; we'd have to have the frameworks transmorgify
their own components (although we might be able to have some MCA
base helper function that does the heavy lifting, as long as the
framework supplied the total struct length).
Note that all three of these solutions involves touching every
framework in some way (although not every component).
----
All that being said, I suppose there's two arguments against these
kinds of issues:
- this situation probably won't happen in practice (component A
compiled against MCA v1.0.0 and against MCA v2.0.0) because we only
distribute components as part of full OMPI releases, and therefore
they're fairly tightly bound to their MCA version. However, for
components that didn't change between OMPI v1.2 and v1.3, you *will*
have this scenario, but in different OMPI installation directories
(and therefore it pretty much doesn't matter).
- I think the crux of Brian's argument is the framework's version
number is identifying *the framework's* interface -- not the whole
interface (i.e., not including the MCA base interface). From this
perspective, it *is* independent of the MCA version number.
Specifically: the version of the framework interface is independent
of the binary compatibility and features issues surrounding the MCA
base.
-----
So Josh and I thought we picked a solution that was clear, simple,
and one-of-several sucky options. :-\ We could probably be
convinced to go another way if someone has strong feelings.
--
Jeff Squyres
Cisco Systems
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems