On Wednesday 28 January 2004 05:28, David Manura wrote:
I'm not sure branching maps cleanly onto the interface versioning scheme as shown above. Let's say you have 1.2. You then branch to 1.2.1.1 => 1.2. Meanwhile, in your main trunk, you create 1.3 => 1.2. OK, now back in the branch, say you want to introduce an incompatible change to 1.2.1.1. There are actually two ways in which your change can be incompatible: with respect to 1.2.1.1 only or with respect to 1.2. You provided an example of the first case, where we introduce 1.2.2.1 =/=> 1.2.2.1 yet 1.2.2.1 => 1.2. However, what shall we do if we need to introduce a change incompatible with 1.2? Number it 1.3? We can't do that because 1.3 has already been assigned in the main trunk.
The next version that is not automatically compatible with 1.x is 2.1 . This is true in your original scheme and in my version with branches.
If you have branches then the next version that is automatically compatible with 1.2 but not with 1.2.2.1 is 1.2.3.1 .
My error. What I meant to say was this:
Let's say you have 1.2. You then branch to 1.2.1.1 => 1.2. Meanwhile, in your main trunk, you create 2.1 =/=> 1.2. OK, now back in the branch, say you want to introduce an incompatible change to 1.2.1.1. There are actually two ways in which your change can be incompatible: with respect to 1.2.1.1 only or with respect to 1.2. You provided an example of the first case, where we introduce 1.2.2.1 =/=> 1.2.2.1 yet 1.2.2.1 => 1.2. However, what shall we do if we need to introduce a change incompatible with 1.2? Number it 2.1? We can't do that because 2.1 has already been assigned in the main trunk. In fact, if we assign it anything of the form x.y (i.e. two numbers rather than four), it can no longer be distinguished as part of the branch rather than as part of the main trunk.
Maybe the branch numbers should be of the formfollowing
x.y.b.u.v
where x.y is the main trunk revision, b is the branch number, and u.v is the branch revision. For simplicity, we'll also eliminate the distinction between changes that are incompatible only with the current branch revision and changes that are incompatible with the main trunk revision. The scheme for x.y will be exactly the same as, yet independent of, the scheme for u.v. So, therelations are implicit:
1.2.1.1.1 ===> 1.2 1.2.1.1.2 ===> 1.2.1.1 1.2.1.2.1 =/=> 1.2 (note!)
why is this =/=> when ...
1.2.1.2.2 ===> 1.2.1.2.1 1.2.2.1.1 ===> 1.2 (a second branch)
... this one is ===> ?
1.2.1.1.1 and 1.2.2.1.1 are both of the first branch revision (i.e. 1.1) within their branches. The first revision of a branch is defined as implicitly compatible with the trunk revision (i.e. 1.2). The magnitude of the branch number (i.e. 1 and 2 respectively) has no meaning in the implicit compatibility rules. So, you can create any number of branches off of 1.2, and each of these branches will start out as being implicitly compatible with 1.2. Here's two more examples to clarify the independence of branches:
1.2.2.x.y =/=> 1.2.1.u.v 1.2.1.u.v =/=> 1.2.2.x.y
for all x, y, u, v.
information?1.2.2.1.2 ===> 1.2.2.1.1 1.2.2.2.1 =/=> 1.2.2.1.2 1.3 =/=> 1.2.b.u.v for all b, u, v 1.3 ===> 1.2
This seems workable, but it's getting more complicated. The question is, will anyone use this? Also, are numbers the best way to express this
It's certainly confusing me and I don't think it will be widely used.
Agree. The subsequent solution is conceptually quite similar but I believe more practical.
>>
The branch identifier 1.2.1 might alternately be labeled something more meaningful like "unstable." So, the above scheme might be rewritten
unstable-1.1 ===> 1.2 (must be declared explicitly) unstable-1.2 ===> unstable-1.1 unstable-2.1 =/=> unstable-1.2 unstable-2.2 ===> unstable-2.1 mycopy-1.1 ===> 1.2 (must be declared explicitly) mycopy-1.2 ===> mycopy-1.1 mycopy-2.1 =/=> mycopy-1.2 1.3 =/=> unstable-*.* (unless otherwise declared explicitly) 1.3 =/=> mycopy-*.* (unless otherwise declared explicitly) 1.3 ===> 1.2
Now, say after merging unstable into 1.4 that you want to branch again, then you just declare this explicitly and continue:
unstable-3.1 ===> 1.4
Use of branch names rather than branch numbers will also reduce the possibility of conflicts when there is no central assignment of branch identifiers (e.g. when I create my own private version of a standard module and name the branch "davidm", unbeknown to the module author).
Yes, once you stop trying to do numerical things to version strings (namely expecting < and > to mean something) then you are no longer forced to use numbers, you can use something more expressive. However numbers are still very common and already have some useful meanings so I want to get the numbers out of the way first and then consider more general strings.
I was thinking making only "imp1.imp2_bug1.bug2" part of the identifier for the distribution file to download, as is currently the case. So, as usual, people can say "I need to download MyModule-1.2_3," and this will uniquely identify the correct file to download. The interface number (or *multiple* interface numbers), however, will be embedded, possibly hidden, inside the module so that "use" will work correctly. The interface numbers might exists as well in the POD to give the user a heads-up, but this is not strictly necessary (if there's a problem, the module user will find out upon compilation). Although not required and maybe not always practical, the module author may even attempt to synchronize the implementation number with the interface number to make things simpler. Therefore, 1.x implementations will implement 1.x versions of the interface, while 2.x implementations will implement 2.x versions of the interface. This may be possible since the module author has full freedom in assigning implementation version numbers (except for the requirement that they
be strictly increasing).
I can think of 2 disadvantages to not using the interface revision in the use statement and in the distribution name.
1 The user will get no compatibility information by just looking at the version. This can also be the case with the interface version but there are many cases where for example the user can immediately say "it's 1.3 and I'm writing for 1.2 so that's ok to install". Putting the information in the POD or metadata makes that harder.
True. My reasons for disposing of the interface number in the distribution name is that a module might implement multiple versions of an interface, and, second, the interface number is redundant: under my scheme the interface number can be deduced from the implementation number upon examining the module. The second point is kind-of analogous to naming a distribution as "mydistribution-2.1-20040128.tar.gz" v.s. "mydistribution-2.1.tar.gz". Here the date is redundant, but I admit it can still be useful to the user. The same can be said of interface numbers.
2 There will be more implementations than interfaces so we would need to declare and store more compatibility information. I think we would also need to keep metadata about more modules. When using implementation versions as the key, if a CPAN deletes a version, they must still have keep the meta data for that version so that anyone asking that version can figure out what hidden interface value. Whereas if everyone is requesting interface versions then I don't have to keep any meta data around for delete distributions.
You have a point here. If some code says
use MyModule qw(1.5impl);
which let's say is equivalent to
use MyModule qw(3.1if);
then the mapping 1.5impl <--> 3.1if must be maintained somewhere. Most likely, the latest version of the module must maintain a history of all previous implementation <--> interface mappings. This would be extra tedium on the module author's part.
I question, though, why you would want to specify implementation versions, instead of just interface versions, inside the use statement. If a module implements the correct interface, it -theoretically- should work.
In general it's a bad idea but you may want the version that uses XML::Twig instead of XML::Parser for some reason.
I'm going to stop thinking about this issue just see if I can get the basics work well and sensibly.
Here's a question: If only implementation 2.0 is installed, and this implements interface 2.3, then will the following fail or succeed?
use MyModule 2.3_1.0;
(I believe it should succeed.)
It should be for the user to decide and I think you have to assume they really really wanted 1.0, otherwise what was the point of mentioning it?
Correct. But then it gets fuzzy. Since implementation numbers are not as rigorously defined as interface numbers, its not clear whether 1.0.1impl or 1.1impl will suffice for 1.0impl.
>
If I've released 5.2_1.2 and 6.7_1.2 which one am I referring to in he "use" above? Bugfix numbers would have to be unique.
Just to make sure we're on the same track (it starts to get confusing :)), the 1.2 in the above use statement refers to the implementation number.
Possibly it should be clarified as this, at least for discussion:
use MyModule qw(1.2impl);
I take 5.2 and 6.7 above to be interface numbers, and 1.2 to be an implementation number.
Yes
In this case, you're saying that implementation 1.2 implements two interfaces.
No. Right now, if I saw Module-5.2_1.2_1.tgz and Module-6.7_1.2_1.tgz I would not draw any conclusion from the 1.2 being in both of them because I'm used to thinking of version numbers as hierarchical but in your scheme you could not have the above because there can be only 1 1.2 implementation which is fine in itself but is different to the current way of doing things.
Basically we've gone from need the version string to be unique to needing the implementation+revision to be unique. Actually that's business as usual in a world where the interface version does not appear in the main version string.
as possible givenI'll acknowledge one if you show me one :-) Version::Split is as lenientthe information at hand.
Correct, at compile time and "given the information at hand." Actually, this is not strictly correct: Version::Split -could- do deep source code analysis on the module that uses it in order to obtain more info--very unlikely but "possible."
When you throw eval into the mix you basically end up with the "halting problem" - can you write a computer program (call it Halt) that can predict whether another program will finish or not. You can show that there is no program Halt that can do the job for every program you throw at it.
Of course Perl is a special case - you don't even have to get theoretical. I saw a program by Mark Jason Dominus compiles differently depending on the phase of the moon! That man needs help :-)
F
Point taken. Here's a much better example....
Say MyModule-1.1if implements functions A and B. Moreover, MyModule-2.1if implements functions A, B, and C except that B is incompatible with 1.1if. Furthermore, MyModule-3.1if implements A, B, and C except C is incompatible with 2.1if.
Now, say the user's code, presumably written before MyModule-3.1if was even available, is as follows and utilizes the "logic" operations suggested in one of your previous messages.
# myprogram.pl use MyModule qw(1.1if and 2.1if); MyModule::A();
We could have instead written "1.1if OR 2.1if" and understood it to mean "myprogram.pl will accept 1.1if or 2.1if." However, for reasons later to become apparent, I use intersection (and) here so as to mean "myprogram.pl is compatible with 1.1if AND 2.1if."
Now, say the user upgrades to MyModule-3.1if. The above code then complains that it's not compatible with 3.1if. However, it can be proven to be compatible! The proof is as follows:
Let "interfaces" be represented as sets containing the features they implement. For example,
1.1if = {A1, B1} 2.1if = {A1, B2, C1} 3.1if = {A1, B2, C2}
Let Xif denote the interface expected by myprogram.pl. Xif is stated to be a subset of the -intersection- of 1.1if and 2.1if:
Xif subset_of 1.1if intersection 2.1if.
For example,
Xif = {A1}.
Further, the author of 3.1if can program it to know that the incompatible changes from 2.1if to 3.1if are disjoint with 1.1if:
broken(2.1if, 3.1if) intersection 1.1if = null_set.
where we define the function "broken" using set difference:
broken(M, N) = M - N.
For example,
broken(2.1if, 3.1if) = {C1}.
Now we get to main argument:
broken(2.1if, 3.1if) intersection 1.1if = null_set implies (2.1if - 3.1if) intersection 1.1if = null_set implies (2.1if intersection not 3.1if) intersection 1.1if = null_set implies (1.1if intersection 2.1if) intersection not 3.1if = null_set implies (1.1if intersection 2.1if) subset_of 3.1if.
Therefore,
Xif subset_of (1.1if intersection 2.1if) subset_of 3.1if. Q.E.D.
In words, myprogram.pl is deduced to be compatible with MyModule-3.1if.
The concept of interface intersection may still need to be more formalized and related to actual Perl code, but that is the general argument. The logic will be further complicated if this is allowed:
# myprogram2.pl use MyModule qw(1.1if and 2.1if); MyModule::A(); MyModule::B();
where the particular call to B() supposedly works under both 1.1if and 2.1if even though B breaks between these versions. For example, say
# 1.1if sub B { return join @_; }
# 2.1if sub B { return (join @_) x 2; }
Both happen to behave identically when myprogram2.pl does
MyModule::B();
In the particular case of MyModule 1.1if, 2.1if, and 3.1if under "use MyModule qw(1.1if and 2.1if)", such possibilities make zero difference to the correctness. However, in general, the logic becomes further complicated this way, and I haven't explored it fully. It might be a bad idea for myprogram.pl to contain a call to B due to the different semantics, but that doesn't prevent the programmer from making such a call. Perhaps, then, constructs like this should be discouraged when interfaces on functions are modified without those functions being renamed:
use MyModule qw(1.1if and 2.1if);
-davidm.