Re: versioning cas serializations

Marshall Schor Wed, 13 Jan 2016 14:35:03 -0800

The advantage of a slightly more complex 2nd version word (with
major/minor/patch) may be in the future that some better backwards compatible
tests could be done.  Also, it really costs essentially nothing, I think.


+1 on your general analysis of versioning :-).

-Marshall

On 1/13/2016 3:56 PM, Richard Eckart de Castilho wrote:
> Hi,
>
> On 13.01.2016, at 21:28, Marshall Schor <[email protected]> wrote:
>> I would turn this on for the repaired binary delta format, and supply a 
>> version
>> number.
>>
>> Our current compressed formats use "1" as the incrementing version number.
>>
>> I'm leaning toward something simple, such as using the Major/Minor/Patch 
>> format,
>> each value 1 byte, in the 3 lower bytes of the 2nd version word, giving 256
>> possibilities for each (more than I've ever seen used).
> +1 for versioning the CAS formats. Every data format should include version 
> information :) The BinaryCasWriter in DKPro Core uses 'D', 'K', 'P', 'r', 
> 'o', '1' as the header for the 6+ format (serialization with compression form 
> 6  prepended with type system information).
>
> Is it really necessary to have a complex versioning scheme for data formats? 
> I'd rather tend towards a plain int versioning: 1, 2, 3, 4, etc. wouldn't 
> that be sufficient?
>
>> The "semantic versioning" standard has sparked some push-back (see
>> https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e )
>> basically saying the "mechanical" approach of semantic versioning isn't rich
>> enough for the grey areas of real world use, and ends up obscuring the 
>> purpose
>> of indicating how "far" one version is from another. 
>
> Regarding SemVer: I don't personally fully trust the plugin we are using. 
> E.g. I tried doing some changes to uimaFIT that I believe are 
> backwards-compatible but the semver plugin believes otherwise. 
>
> Other than that, I am not quite convinced of the criticism towards semver 
> either. 
>
> Let's just consider (for software):
>
> - if we do bug-fixes, we typically make this a x.y.+1 - bug-fixes shouldn't 
> change the API - sounds reasonable to me
>
> - when adding new features, I would personally always tend towards a x.+1.0 - 
> in the past, we had various UIMA releases that added cool new feature but 
> increased the version only at the last digit. Undeserved, I think. Since we 
> use semver, we increase the middle digit more and I think that is very 
> appropriate and reflects the activity in the project much better.
>
> - that leaves the first digit, which IMHO is often a marketing digit: 
> increase it to tell people that all is new and shiny and they should have 
> another fresh look at the project. I don't think we need that. Using it to 
> indicate major breaking changes (which are typically part of a major 
> refactoring with cool new features that people should have a look at) seems 
> quite appropriate to me. We are now in UIMA 2. UIMA 1 was IBM UIMA. I do 
> believe that if we are introducing major changes now like a completely new 
> CAS, that warrants going to UIMA 3.
>
> So looking at that and minus some doubts that I have about the accuracy of 
> the semver plugin, I believe that the idea of semver in general is quite 
> sensible - at least when going with a three-part versioning scheme. I would 
> consider the plugin as an automatic alert for accidentally introducing 
> incompatible changes and the semver idea
> as a guideline. When we consider it a good idea, I think we should add 
> exceptions and overrides to the plugin
> for particular releases. 
>
> Cheers,
>
> -- Richard
>
>> On 13.01.2016, at 21:28, Marshall Schor <[email protected]> wrote:
>>
>> Hi,
>>
>> I'm working on UIMA-4743 - fixing some binary cas serialization problems, 
>> which
>> will unfortunately make the binary serialization for "delta" formats not
>> backward compatible (the fix may have extra bytes in it).
>>
>> We currently have a partially architected scheme for serialization forms, 
>> which
>> looks like:
>>  - 1 word encoding U + I + M + A and also serving to identify byte order
>>  - 1 word for bit-encoding some categorizations:
>>     -- a bit for delta / non delta
>>     -- a bit for compressed / non compressed
>>  - 0 or 1 additional word for incrementing in some fashion a version number 
>> for
>> a particular serialization category (named below as "2nd version word)
>>
>> This 2nd version word is currently only used with compressed serialization 
>> formats.
>>
>> I'm thinking of assigning another bit in the first word to indicate there's a
>> 2nd version word present.
>>
>> I would turn this on for the repaired binary delta format, and supply a 
>> version
>> number.
>>
>> Our current compressed formats use "1" as the incrementing version number.
>>
>> Thinking ahead, perhaps the serialization formats should have a multi-part 
>> 2nd
>> version word, along some standards. 
>> The "semantic versioning" standard has sparked some push-back (see
>> https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e )
>> basically saying the "mechanical" approach of semantic versioning isn't rich
>> enough for the grey areas of real world use, and ends up obscuring the 
>> purpose
>> of indicating how "far" one version is from another. 
>>
>> I'm leaning toward something simple, such as using the Major/Minor/Patch 
>> format,
>> each value 1 byte, in the 3 lower bytes of the 2nd version word, giving 256
>> possibilities for each (more than I've ever seen used).
>>
>> Other ideas?
>>
>> -Marshall
>

Re: versioning cas serializations

Reply via email to