[ 
https://issues.apache.org/jira/browse/AVRO-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977859#action_12977859
 ] 

Doug Cutting commented on AVRO-656:
-----------------------------------

> this patch would be a major backwards-incompatible change to the spec. In our 
> code, we we're using the ["null", "fixed4", "fixed16"] case all the time to 
> represent IPv4 or IPv6 addresses

That would nix the patch, then, since we don't want to introduce such an 
incompatibility. If C does correctly implement unions as specified then I was 
mistaken to assert above that no language did.

So instead perhaps I should fix Java to correctly implement unions as currently 
specified:
 - fixing union dispatch among records to consider the namespace (easy, should 
be compatible, already in this patch)
 - adding a getSchema() method to GenericEnumSymbol and GenericFixed so that we 
can check the name (incompatible API change, adding a Schema method to the 
constructors for these)

Unless there are objections, I'll try this approach.


> writing unions with multiple records, fixed or enums can choose wrong branch 
> -----------------------------------------------------------------------------
>
>                 Key: AVRO-656
>                 URL: https://issues.apache.org/jira/browse/AVRO-656
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.5.0
>
>         Attachments: AVRO-656.patch, AVRO-656.patch
>
>
> According to the specification, a union may contain multiple instances of a 
> named type, provided they have different names.  There are several bugs in 
> the Java implementation of this when writing data:
>  - for record, only the short-name of the record is checked, so the branch 
> for a record of the same name in a different namespace may be used by mistake
>  - for enum and fixed, the name of the record is not checked, so the first 
> enum or fixed in the union will always be assumed when writing.  in many 
> cases this may cause the wrong data to be written, potentially corrupting 
> output.
> This is not a regression.  This has never been implemented correctly by Java. 
>  Python and Ruby never check names, but rather perform a full, recursive 
> validation of content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to