[
https://issues.apache.org/jira/browse/THRIFT-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270387#comment-17270387
]
Juan Cruz Viotti commented on THRIFT-5340:
------------------------------------------
Hi [~jensg],
Thanks a lot for the thorough comment. The execute summary and the Thrift's
guide section on compatibility
([https://diwakergupta.github.io/thrift-missing-guide/#_versioning_compatibility)]
make a lot of sense. My experiments revealed a perhaps more fine-grained
analysis on compatibility that I would like to share with you in case you think
its useful to document. Also, I found that compatibility is a bit hard to test
as a particular scenario may be more subtle than it seems when using certain
data types or depending on the surrounding data. Please let me know if
something is off and I can investigate further!
What I found so far is:
* Adding an optional field to the end of a struct is fully compatible
* Removing an optional field (while not re-using the identifier) is fully
compatible
* Adding a required field to the end of a struct is forwards compatible. The
new schema encodes the new field and the old schema gracefully omits it as its
unknown
* Conversely, removing a required field from the end of the struct is
backwards compatible. The old schema encodes the new field and the new schema
omits it as its unknown
* Changing a field from optional to required is forwards compatible. The new
schema always sets the field and the old schema can parse it correctly
* Conversely, following the same reasoning, changing a field from required to
optional is backwards compatible
* Adding a choice to an existing union is backwards compatible. The new schema
can always parse the data produced by the old schema as the new choices are a
superset of the previous choices
* Conversely, removing a choice from an existing union is forwards compatible
* Changing an enumeration into a 32-bit integer is backwards compatible as the
set of valid enumeration constants are a subset of the range of values in the
integer
* Conversely, turning a 32-bit integer into an enumeration is forwards
compatible only. The data produced by the new schema will always become a valid
int32 value according to the old schema
* Similar to the union case, adding a new enumeration constant is backward
compatible and removing an enumeration constant (without re-using it in the
future) is forwards compatible
* Changing a string field into a "binary" field is backwards compatible. Every
UTF-8 string produced by the old schema is of course a valid byte-string from
the point of view of the new schema
* Conversely, changing a "binary" field into a string field is forwards
compatible. It is not backwards compatible as not every byte-array is a valid
UTF-8 string
Everything else I tried either resulted in an exception or in an incompatible
result at least for one case.
Do you think the above makes sense? My intention is to provide a more detailed
set of operations that can be assumes to be safe, broken down by backwards,
forwards, and fully compatibility. Thanks a lot!
> Document schema evolution features
> ----------------------------------
>
> Key: THRIFT-5340
> URL: https://issues.apache.org/jira/browse/THRIFT-5340
> Project: Thrift
> Issue Type: Improvement
> Components: Documentation
> Reporter: Juan Cruz Viotti
> Priority: Minor
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> I could not find a section in the documentation outlining the schema
> evolution/versioning features that Thrift provides.
> In case there is none, I volunteer to write the first draft, as I've been
> writing a paper involving Apache Thrift as part of my MSc at University of
> Oxford, and ran plenty of schema evolution experiments.
> Please let me know your thoughts and where would this section fit!
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)