[
https://issues.apache.org/jira/browse/AVRO-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093111#comment-17093111
]
Roger commented on AVRO-2647:
-----------------------------
> I actually would say that the schema is not ambiguous. The default value is
> correct given the type of the field.
Given that E1 and E2 are distinct types that should presumably be
distinguishable after decoding, what will be the decoded type of `R.F[0]`when
decoding a value of type `R` from a version of `R` that doesn't have the field
`F` ?
If the schema isn't ambiguous there should be an unambiguous answer to the
above question, but I don't believe there is.
> specification does not fully specify semantics for unions in default types
> --------------------------------------------------------------------------
>
> Key: AVRO-2647
> URL: https://issues.apache.org/jira/browse/AVRO-2647
> Project: Apache Avro
> Issue Type: Bug
> Components: doc
> Reporter: Roger
> Priority: Minor
>
> Currently the specification does not make the semantics clear for union types
> within complex types clear. In particular, the spec talks about union fields,
> but leaves the semantics for unions in other contexts unspecified.
> Here's an example which is undefined according to the current specification:
> {code:json}
> {
> "type": "record",
> "name": "R",
> "fields": [
> {
> "name": "F",
> "type": {
> "type": "array",
> "items": [
> {
> "type": "enum",
> "name": "E1",
> "symbols": ["A", "B"]
> },
> {
> "type": "enum",
> "name": "E2",
> "symbols": ["B", "A", "C"]
> }
> ]
> },
> "default": ["A", "B", "C"]
> }
> ]
> }
> {code}
> By experiment, most implementations seem to have chosen the semantics that
> are documented in this PR.
> In Java, the schema above is parsed without error, but when attempting to use
> the default value, it fails with a NullPointerException (trying to find the
> symbol C in E1). (Thanks for Ryan Skraba for this).
> In [gogen-avro|https://github.com/actgardner/gogen-avro] it generates invalid
> code because it's assuming E1 but generating the symbol for "C" anyway.
> FWIW at some point in the future, I believe that it would be nice to align
> the default value specification with the JSON encoding for Avro so there
> aren't two subtly different JSON encodings of an Avro value.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)