The advantage of a slightly more complex 2nd version word (with major/minor/patch) may be in the future that some better backwards compatible tests could be done. Also, it really costs essentially nothing, I think.
+1 on your general analysis of versioning :-). -Marshall On 1/13/2016 3:56 PM, Richard Eckart de Castilho wrote: > Hi, > > On 13.01.2016, at 21:28, Marshall Schor <[email protected]> wrote: >> I would turn this on for the repaired binary delta format, and supply a >> version >> number. >> >> Our current compressed formats use "1" as the incrementing version number. >> >> I'm leaning toward something simple, such as using the Major/Minor/Patch >> format, >> each value 1 byte, in the 3 lower bytes of the 2nd version word, giving 256 >> possibilities for each (more than I've ever seen used). > +1 for versioning the CAS formats. Every data format should include version > information :) The BinaryCasWriter in DKPro Core uses 'D', 'K', 'P', 'r', > 'o', '1' as the header for the 6+ format (serialization with compression form > 6 prepended with type system information). > > Is it really necessary to have a complex versioning scheme for data formats? > I'd rather tend towards a plain int versioning: 1, 2, 3, 4, etc. wouldn't > that be sufficient? > >> The "semantic versioning" standard has sparked some push-back (see >> https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e ) >> basically saying the "mechanical" approach of semantic versioning isn't rich >> enough for the grey areas of real world use, and ends up obscuring the >> purpose >> of indicating how "far" one version is from another. > > Regarding SemVer: I don't personally fully trust the plugin we are using. > E.g. I tried doing some changes to uimaFIT that I believe are > backwards-compatible but the semver plugin believes otherwise. > > Other than that, I am not quite convinced of the criticism towards semver > either. > > Let's just consider (for software): > > - if we do bug-fixes, we typically make this a x.y.+1 - bug-fixes shouldn't > change the API - sounds reasonable to me > > - when adding new features, I would personally always tend towards a x.+1.0 - > in the past, we had various UIMA releases that added cool new feature but > increased the version only at the last digit. Undeserved, I think. Since we > use semver, we increase the middle digit more and I think that is very > appropriate and reflects the activity in the project much better. > > - that leaves the first digit, which IMHO is often a marketing digit: > increase it to tell people that all is new and shiny and they should have > another fresh look at the project. I don't think we need that. Using it to > indicate major breaking changes (which are typically part of a major > refactoring with cool new features that people should have a look at) seems > quite appropriate to me. We are now in UIMA 2. UIMA 1 was IBM UIMA. I do > believe that if we are introducing major changes now like a completely new > CAS, that warrants going to UIMA 3. > > So looking at that and minus some doubts that I have about the accuracy of > the semver plugin, I believe that the idea of semver in general is quite > sensible - at least when going with a three-part versioning scheme. I would > consider the plugin as an automatic alert for accidentally introducing > incompatible changes and the semver idea > as a guideline. When we consider it a good idea, I think we should add > exceptions and overrides to the plugin > for particular releases. > > Cheers, > > -- Richard > >> On 13.01.2016, at 21:28, Marshall Schor <[email protected]> wrote: >> >> Hi, >> >> I'm working on UIMA-4743 - fixing some binary cas serialization problems, >> which >> will unfortunately make the binary serialization for "delta" formats not >> backward compatible (the fix may have extra bytes in it). >> >> We currently have a partially architected scheme for serialization forms, >> which >> looks like: >> - 1 word encoding U + I + M + A and also serving to identify byte order >> - 1 word for bit-encoding some categorizations: >> -- a bit for delta / non delta >> -- a bit for compressed / non compressed >> - 0 or 1 additional word for incrementing in some fashion a version number >> for >> a particular serialization category (named below as "2nd version word) >> >> This 2nd version word is currently only used with compressed serialization >> formats. >> >> I'm thinking of assigning another bit in the first word to indicate there's a >> 2nd version word present. >> >> I would turn this on for the repaired binary delta format, and supply a >> version >> number. >> >> Our current compressed formats use "1" as the incrementing version number. >> >> Thinking ahead, perhaps the serialization formats should have a multi-part >> 2nd >> version word, along some standards. >> The "semantic versioning" standard has sparked some push-back (see >> https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e ) >> basically saying the "mechanical" approach of semantic versioning isn't rich >> enough for the grey areas of real world use, and ends up obscuring the >> purpose >> of indicating how "far" one version is from another. >> >> I'm leaning toward something simple, such as using the Major/Minor/Patch >> format, >> each value 1 byte, in the 3 lower bytes of the 2nd version word, giving 256 >> possibilities for each (more than I've ever seen used). >> >> Other ideas? >> >> -Marshall >
