[
https://issues.apache.org/jira/browse/AVRO-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977852#action_12977852
]
Doug Cutting commented on AVRO-656:
-----------------------------------
> is the problem that there's no wrapper object for a union value?
Yes. Avro implementations in languages where runtime type information is
available (Java, Python, Ruby, PHP, etc.) don't currently use a wrapper for
union values. This is convenient, since one can simply pass, e.g., either an
int or a float, and the right thing will happen, but also can create some
confusion when the language's types don't align well with Avro's. For example,
Python does not have a single-precision floating point type, just a double.
Generally this isn't a problem: if a schema declares a single-float, the double
is truncated when it's written and expanded when it's read. But if a union
contains both single and double types, then Python will always write the type
that's listed first in the union. Such confusions aren't so great that anyone
seems to agitating to add union wrappers to these implementations, since that
would make unions considerably less convenient to use. But if there are ways
we can modify Avro to minimize such confusions, so much the b!
etter.
> writing unions with multiple records, fixed or enums can choose wrong branch
> -----------------------------------------------------------------------------
>
> Key: AVRO-656
> URL: https://issues.apache.org/jira/browse/AVRO-656
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.4.0
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.5.0
>
> Attachments: AVRO-656.patch, AVRO-656.patch
>
>
> According to the specification, a union may contain multiple instances of a
> named type, provided they have different names. There are several bugs in
> the Java implementation of this when writing data:
> - for record, only the short-name of the record is checked, so the branch
> for a record of the same name in a different namespace may be used by mistake
> - for enum and fixed, the name of the record is not checked, so the first
> enum or fixed in the union will always be assumed when writing. in many
> cases this may cause the wrong data to be written, potentially corrupting
> output.
> This is not a regression. This has never been implemented correctly by Java.
> Python and Ruby never check names, but rather perform a full, recursive
> validation of content.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.