Re: VERSION as (interface,revision) pair and CPAN++

David Manura Sat, 24 Jan 2004 11:29:26 -0800

Fergal Daly wrote:

>...

The previous two points point out that version numbers are now being used for TWO things, neither well: (a) to (partly) describe the amount of code changes and (b) to (partly) describe interface/behavioral compatibility. I believe that one or the other should be chosen rather than both.

Are you saying that having split our current version number into 2 parts, I
should have actually split it into 3? One to indicate the interface, one to
indicate the revision and one to indicate how much code changed?

These 3 things do need to be expressed somehow but I'm not sure that "how
much code changed" can be expressed as a number. Perhaps we need an
"internal version" number which tracks the revisions of how the code works
rather than what it does. This is useful for users of the module but not too
useful to automated tools - they just need to know if it's compatible, they
cannot possibly make any decisions based on how different the internals are
because they don't care about the internals.

One way might be to keep abstract interface numbers separate from your
concrete implementations. You could name your interfaces "1.2if", "1.3if"
etc you would never actually release anything with version set to "1.2if".
You would release My::Module-1.2_1 which declares itself compatible with
"1.2if" then some day in the future you rewrite and release My::Module-2.0_1
which is also compatible with "1.2if".

You would have to strongly encourage your users to only specify abstract
versions in their use statements.

Let's simplify things for the moment and consider what the version number would look like if it expressed only interface changes (i.e. no bug fixes or internal refactoring).

In this case, the version number must be able to express two cases, maybe three:

(1) All code that works with Version A will also work with subsequent Version B. (e.g. adding new functions)

(2) There exists code that works with Version A but will not work with Version B. (e.g. changing existing function signatures)

(3) There exists code that works with Version A, will not work with Version B, but will work with an even more future Version C. (probably a rare case)

To handle #1 and #2, we could require all interface version numbers be of the form x.y such for any two increasing interface numbers x.y and u.v, assertion #1 is true iff x=u and v>=y. Assertion #2 is true iff the opposite is true (i.e. x!=u or v<y). There is no use for long version numbers as mentioned (e.g. 1.2.3.4).

To handle #3, which is more rare under this new proposal, the module probably will need to provide a compatibility map as suggested:

  use Version::Split qw(
      2.1 => 1.1
  );

That is, code compatible with 1.1 is compatible with 2.1 but might not be compatible with 2.0 such as if 2.0 removed a function present in 1.1 only for it to appear in 2.1. Furthermore, code compatible with 1.2 may or may not be compatible with 2.1. The above use statement would consider them to be incompatible, but how would we express compatibility if they are actually compatible? Could we do this?

  use Version::Split qw(
      2.1 => 1.2
  );

Now, code compatible with 1.2 is known to be compatible with 2.1. Code compatible with 1.1 (or 1.0) is implicitly known to be compatible with 1.2, which in turn is known to be compatible with 2.1. Code known to be compatible only with 1.3, however, remains considered incompatible with 2.1. The above does not suggest that code compatible with 2.1 is compatible with 1.2, rather the reverse.

I think this scheme holds for interfaces changes. I'd be interested to see some real examples where #3 has occured.

> Are you saying that having split our current version number into 2 parts, I
> should have actually split it into 3? One to indicate the interface, one to
> indicate the revision and one to indicate how much code changed?

I questioned combining the interface version and amount-of-code-change version into one number. However, could we combine the bug-fix-number and amount-of-code-change number? Are these really different? A major internal refactoring could be fixing bugs even if we never discover them. It could be adding new bugs as well, but bug fixes can also inadvertently introduce new bugs. I propose these two be combined, such as maybe x.y_n, where x.y is the refactoring part and n is the bug fix, or maybe just x.y.z to eliminate the distinction all-together.

Given a combined refactoring+bugfix number, does the number hold any significance? You would expect 1.2.15 to be more stable that 1.2.14 as it is probably fixed a bug. Alternately, it might have made a small change to an algorithm--i.e. refactoring. We don't know. We would also expect 2.0.1 to be better implemented/designed that 1.2.14, as the 2.x effort probably did some major refactoring, possibly at the initial expense of stability. However, how does 2.1.79 compare with 1.2.14 in terms of stability? It's difficult to say from the numbers alone, and the two tasks of bug fixing and refactoring can occur simultaneously. We might say that x.y.z is more stable than u.v.w iff y > v or (y = v and z > w). However, it's not clear whether y and v really represent code change or stability--we're mixing two things.

But should we care about this confusion? We might want to have an automated tool update our systems but only download bug fixes (or certain classes of bug fixes, such as security patches) in case of a production system. Trying to store this information in a single number, rather than in metadata, might not be appropriate. So, we might just let the ambiguity between 1.2.13, 1.2.14, 2.1.79 remain.

What does this mean? When we say

use MyModule 1.2;

we could have it accept any version of MyModule that has a interface version compatible with the interface version associated with the refactoring-bugfix version 1.2. As such, the module user might never see the interface version. We might equivalently say

use MyModule 1.4if;

to refer to the more canonical interface version, not the refactoring-bugfix version.

(3)

Version::Split is conservative (safe) in its detection of version incompatibilities, with a relatively low number of false negatives (theoretically zero though not in practice) and a relatively high number of false positives. That is, if Version::Split says that my code will remain compatibile with a new version of a module, it will most likely be correct. But if Version::Split says there is an incompatibility, there is a good chance it may be wrong. Case in point:
 # Shape.pm v. 1.1
 ...
 sub new { ... }
 sub get_area { ... }
 sub get_color { ... }
 # Shape.pm v. 1.2
 ...
 sub new { ... }
 sub get_area { ... }
 sub get_colour { ... } # incompatible
# myprogram.pl # Note: this code does not use get_color(), so it should also work with 1.2. use Shape 1.1; my $s = new Shape(); print $s->get_areas();

This fact is not necessarily bad but a tradeoff of generality v.s. complexity, something that could be documented in the POD.
Our current version systems will also be mistaken here - giving a false
positive for a program that uses get_color(). So Version::Split can only get
it wrong when current methods would too. Version::Split never gives a false
positive and therefore never causes a nuclear reactor to explode.

Correct, Version::Split is better than the current systems, and never causes a nuclear reactor to explode (in theory). However, there are alternative solutions that would be more lenient on accepting modules yet also never cause a nuclear reactor to explode. They may be slightly or greatly more complicated, but they do exist, something which I believe the POD should acknowledge even if it doesn't solve.

These type of situations could actually be solved if Shape.pm's author
making some abstract versions, although the amount of work involved with
that gets ridiculous as the number of combinations goes up.

The ultimate solution would be to allow something like

use Shape (new => 1.2 1.1) and (get_areas 1.2);

but I'm not advocating that and I think we'll all be going to the office in
flying cars before that could ever happen!

Or we could do

use Shape qw(1.1 new get_areas);

meaning that we require a Shape module having the same interface on new and get_areas methods as present on refactoring-bugfix version 1.1. The versioning module would work out the details assuming it can determine which methods/functions changed interfaces between versions. Preferably, we wouldn't even have to specify which functions/methods we use:

use Shape 1.1;

but let the versioning module (somehow) figure that out. Unfortunately, that could be a problem since we might not disconver until run-time which methods some code uses:

$shape->$methodname();

Not saying that this should be done, but I'm not yet ruling out that a way exists to conveniently do it. Unfortunately, such a check may need to be done at run-time.

Here's related possibility. The following can be used as stated before:

use Shape 1.1;

and if the user has Shape 1.2 installed on one's system, then an error like this will be generated at compile time:

Error: MyMoule.pm depends on Shape.pm 1.1, but you currently have Shape.pm 1.2 installed which in incompatible in the following methods: - get_color() A call to get_color was not found in MyModule.pm on doing a scan of the source code, but the absence cannot be certain. To override this error, and run anyway, please do <insert corrective action here, such as setting a command-line switch>.

The reason for this may be either for greater leniency or simply to improve the quality of the error message. Rather than reporting all this info at compile time, the module user might alternately run a command to see an interface "diff" of two versions of a module to determine if/how the problem can be corrected:

perl -MVersion::Split -e "idiff 'Shape', '1.1', '1.2'"

or

perl -MVersion::Split -e "find_changes 'Shape::get_color'"

That would be cool. It would require, though, that Shape.pm v.1.2 explicitly state somewhere in its metadata that the "get_color" function of 1.1 is broken in 1.2. I don't believe that the tool can in general discover this information automatically by examining the code; rather, the author of Shape.pm must provide it manually. This is because our references to "interface changes" refer not only to simple function signatures (largely absent in Perl) but to the more vaguely definfed external behavior as well.

-davidm

Re: VERSION as (interface,revision) pair and CPAN++

Reply via email to