[
https://issues.apache.org/jira/browse/AVRO-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymie Stata reassigned AVRO-419:
---------------------------------
Assignee: Raymie Stata
> Consistent laziness when resolving partially-compatible changes
> ---------------------------------------------------------------
>
> Key: AVRO-419
> URL: https://issues.apache.org/jira/browse/AVRO-419
> Project: Avro
> Issue Type: Bug
> Components: spec
> Reporter: Raymie Stata
> Assignee: Raymie Stata
> Priority: Major
>
> Avro schema resolution is generally "lazy" when it comes to dealing with
> incompatible changes. If the writer writes a union of "int" and "null", and
> the reader expects just an "int", Avro doesn't raise an exception unless the
> writer _actually_ writes a "null" (and the reader attempts to read it).
> This laziness is a powerful feature for supporting "forward compatibility"
> (old readers reading data written by new writers). In the example just
> given, for example, we might decide at some point that a column needs to be
> "nullable" but there's a lot of old code that assumes that it's not. When
> using old code, we can often arrange to avoid sending the old code any new
> records that have null-values in that column. It's powerful to allow new
> writers to write against the nullable schema and allow readers to read those
> records. (For this to be safe, it's also important that this be _checked,_
> i.e., that a run time error is thrown is a bad value is passed to the reader.)
> Avro is lazy in many places (e.g., in the union example just given, and for
> enumerations). But it's not _consistently_ lazy. I propose we comb through
> the spec and make it lazy in all places we can, unless there's a compelling
> reason not to.
> Numeric types is one area where Avro is not consistently lazy. I propose
> that we fairly liberally allow any change from one numeric type to another,
> and raise errors at runtime if bad values are found. An "int" can be changed
> to a "long", for example, and an error is raised when a reader gets an
> out-of-bounds value. A "double" can be changed to an "int", and an error is
> raised if the reader gets a non-integer value or an out-of-bounds value.
> (I'm not sure if there are types beyond numerics where we could be more
> consistently lazy, but I decided to write this issue generically just in
> case.)
> (One might object that these checks are expensive, but note that they are
> only needed when the reader and writer specs don't agree. Thus, if these
> checks are induced, then the system designer _wanted_ these checks, we're
> only adding value here, not inducing costs.)
> I'm not sure if there are other a
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)