2009/6/12 Ben Finney <[email protected]>: > I realise now that this has an unintended effect: that version strings > which have letters in differing case will compare ASCIIbetically, which > may be non-obvious: > > 1.2.C1 > 1.2.D1 > 1.2.REV876 > 1.2.a1 > 1.2.b1 > 1.2.rev543 > > I hereby simplify the above specification and its semantics, by > declaring upper-case letters outside the scope of a version string. A > component can have characters from the set [0-9a-z], removing the above > cases of non-obvious comparison. > > 1.2.a1 > 1.2.b1 > 1.2.c1 > 1.2.d1 > 1.2.rev543 > 1.2.rev876
One other aspect of standard practice that I just realised your rules don't cover is where version strings differ in length. The normal lexicographic "shortest is earliest" rule doesn't work properly: 1.2a1 vs 1.2 (I hope everyone agrees that 1.2a1 is earlier) Even adding a dot, 1.2.a1 vs 1.2 compares wrongly (and gets worse when you add in 1.2.1...) Here's an alternative suggestion: * Versions are treated as dot-separated tuples * Comparison is component-by-component, exactly as Python tuples compare * Components must have the form [a-z]*[0-9]+([a-z][0-9]+)? (ie, optional leading alphas, an integer, and an optional "letter-integer" suffix) * Call the 3 parts "prefix" ([a-z]*), "number" ([0-9]+), "suffix" ([a-z][0-9]+) * Components compare as follows: - Components with differing prefixes are incomparable[1]. Otherwise, ignore the prefix. - Within this, sort by the number part (as a number, not as text) - Within this, components with a suffix sort BEFORE those without, in the obvious letter-then-number order. That's a little messy, but I think it follows people's intuition, allows for most of the variations people want, and most importantly (to my mind) isolates the complexity to how *components* sort against each other (the high-level rule is "like tuples", which is simple). [1] Note that I see the "prefix" as cosmetic. I would expect real projects to use a fixed prefix on a component-by-component basis - 1.2.r34567 or 1.2.dev5 or whatever, but never a mix of 1.2.3, 1.2.r1234 and 1.2.dev5. Hence, I have said that mixed prefixes are incomparable. If this causes an outcry, the following rule could be used instead: - Components with a prefix sort before components without, in alphabetic order of prefix but in my view it adds unnecessary complexity (and hence I'd like to see real-world, justified use cases). Hmm, this doesn't allow for a component which is a SHA ID (something like a Mercurial revision ID). Given that these aren't ordered, I think that's OK as they don't make usable version numbers in any case. Paul. _______________________________________________ Distutils-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/distutils-sig
