[ 
https://issues.apache.org/jira/browse/AVRO-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978144#action_12978144
 ] 

Scott Carey commented on AVRO-656:
----------------------------------

bq.  But....this patch would be a major backwards-incompatible change to the 
spec. In our code, we we're using the ["null", "fixed4", "fixed16"] case all 
the time to represent IPv4 or IPv6 addresses.

I have a lot of data written by Java in Avro Data Files with a similar Union of 
two fixed types for IPv4 / IPv6 handling.  I have not run into the bug (yet) 
because we haven't used the ipv6 branch yet in real data.

If there was a change, new code would have to be able to read old data 
serialized with such unions and persisted with those schemas in Avro Data Files 
or I'd be stuck.  I might try Doug's patch and see what happens when it reads 
my data files next week if it is still relevant.

I feel that the best approach here is to have the dynamic languages be 
responsible in their own APIs for letting users decide how to deal with data 
types that the language doesn't support or that are ambiguous with other types. 
  Douglas Creager mentions a few possibilities.  We don't have to require that 
a Union use a wrapper, but we can use wrappers (or other techniques) optionally 
to disambiguate.

There will always be a language that has a simpler typing system than Avro; its 
API will thus have some ambiguity and need mechanisms for users to disambiguate 
when writing to an Avro schema that is more rich than what is natural in that 
language.


> writing unions with multiple records, fixed or enums can choose wrong branch 
> -----------------------------------------------------------------------------
>
>                 Key: AVRO-656
>                 URL: https://issues.apache.org/jira/browse/AVRO-656
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.5.0
>
>         Attachments: AVRO-656.patch, AVRO-656.patch
>
>
> According to the specification, a union may contain multiple instances of a 
> named type, provided they have different names.  There are several bugs in 
> the Java implementation of this when writing data:
>  - for record, only the short-name of the record is checked, so the branch 
> for a record of the same name in a different namespace may be used by mistake
>  - for enum and fixed, the name of the record is not checked, so the first 
> enum or fixed in the union will always be assumed when writing.  in many 
> cases this may cause the wrong data to be written, potentially corrupting 
> output.
> This is not a regression.  This has never been implemented correctly by Java. 
>  Python and Ruby never check names, but rather perform a full, recursive 
> validation of content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to