Hi, On 13.01.2016, at 21:28, Marshall Schor <[email protected]> wrote: > I would turn this on for the repaired binary delta format, and supply a > version > number. > > Our current compressed formats use "1" as the incrementing version number. >
> I'm leaning toward something simple, such as using the Major/Minor/Patch > format, > each value 1 byte, in the 3 lower bytes of the 2nd version word, giving 256 > possibilities for each (more than I've ever seen used). +1 for versioning the CAS formats. Every data format should include version information :) The BinaryCasWriter in DKPro Core uses 'D', 'K', 'P', 'r', 'o', '1' as the header for the 6+ format (serialization with compression form 6 prepended with type system information). Is it really necessary to have a complex versioning scheme for data formats? I'd rather tend towards a plain int versioning: 1, 2, 3, 4, etc. wouldn't that be sufficient? > The "semantic versioning" standard has sparked some push-back (see > https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e ) > basically saying the "mechanical" approach of semantic versioning isn't rich > enough for the grey areas of real world use, and ends up obscuring the purpose > of indicating how "far" one version is from another. Regarding SemVer: I don't personally fully trust the plugin we are using. E.g. I tried doing some changes to uimaFIT that I believe are backwards-compatible but the semver plugin believes otherwise. Other than that, I am not quite convinced of the criticism towards semver either. Let's just consider (for software): - if we do bug-fixes, we typically make this a x.y.+1 - bug-fixes shouldn't change the API - sounds reasonable to me - when adding new features, I would personally always tend towards a x.+1.0 - in the past, we had various UIMA releases that added cool new feature but increased the version only at the last digit. Undeserved, I think. Since we use semver, we increase the middle digit more and I think that is very appropriate and reflects the activity in the project much better. - that leaves the first digit, which IMHO is often a marketing digit: increase it to tell people that all is new and shiny and they should have another fresh look at the project. I don't think we need that. Using it to indicate major breaking changes (which are typically part of a major refactoring with cool new features that people should have a look at) seems quite appropriate to me. We are now in UIMA 2. UIMA 1 was IBM UIMA. I do believe that if we are introducing major changes now like a completely new CAS, that warrants going to UIMA 3. So looking at that and minus some doubts that I have about the accuracy of the semver plugin, I believe that the idea of semver in general is quite sensible - at least when going with a three-part versioning scheme. I would consider the plugin as an automatic alert for accidentally introducing incompatible changes and the semver idea as a guideline. When we consider it a good idea, I think we should add exceptions and overrides to the plugin for particular releases. Cheers, -- Richard > On 13.01.2016, at 21:28, Marshall Schor <[email protected]> wrote: > > Hi, > > I'm working on UIMA-4743 - fixing some binary cas serialization problems, > which > will unfortunately make the binary serialization for "delta" formats not > backward compatible (the fix may have extra bytes in it). > > We currently have a partially architected scheme for serialization forms, > which > looks like: > - 1 word encoding U + I + M + A and also serving to identify byte order > - 1 word for bit-encoding some categorizations: > -- a bit for delta / non delta > -- a bit for compressed / non compressed > - 0 or 1 additional word for incrementing in some fashion a version number > for > a particular serialization category (named below as "2nd version word) > > This 2nd version word is currently only used with compressed serialization > formats. > > I'm thinking of assigning another bit in the first word to indicate there's a > 2nd version word present. > > I would turn this on for the repaired binary delta format, and supply a > version > number. > > Our current compressed formats use "1" as the incrementing version number. > > Thinking ahead, perhaps the serialization formats should have a multi-part 2nd > version word, along some standards. > The "semantic versioning" standard has sparked some push-back (see > https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e ) > basically saying the "mechanical" approach of semantic versioning isn't rich > enough for the grey areas of real world use, and ends up obscuring the purpose > of indicating how "far" one version is from another. > > I'm leaning toward something simple, such as using the Major/Minor/Patch > format, > each value 1 byte, in the 3 lower bytes of the 2nd version word, giving 256 > possibilities for each (more than I've ever seen used). > > Other ideas? > > -Marshall
