[
https://issues.apache.org/jira/browse/AVRO-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205664#comment-13205664
]
Douglas Kaminsky commented on AVRO-973:
---------------------------------------
And to expound on my previous comment:
There is no change you can make to the current validation-based mechanic that
guarantees correctness for record types - for example, consider that you could
have complex numeric types that are similar in structure but distinct in
meaning:
Amount { "mantissa" : "string", "exp" : "string" }
Money { "mantissa" : "string", "exp" : "string", "currency" : {"type" :
"string", "default" : "USD"} }
This is a trivial example, but believe me when I tell you that we have 209
types in our schema and several build on each other.
I contend that to be correct, the implementation should work correctly
regardless of union order, ie. Serializing against ["null", "Amount", "Money"]
should yield the same result as ["null", "Money", "Amount"]
Now suppose I serialize datum:
{ "mantissa" : "314159", "exp" : "-5" }
* If you validate without the break, this will serialize as "Money" against
["null", "Amount", "Money"] but "Amount" against ["null", "Money", "Amount"]
* With the break, this will serialize as "Amount" against ["null", "Amount",
"Money"] but "Money" against ["null", "Money", "Amount"]
Either way, the intention of the message sender is lost.
> Union behavior not consistent
> -----------------------------
>
> Key: AVRO-973
> URL: https://issues.apache.org/jira/browse/AVRO-973
> Project: Avro
> Issue Type: Bug
> Components: python
> Affects Versions: 1.6.1, 1.6.2
> Reporter: Gaurav Nanda
> Labels: patch
> Attachments: AVRO-973-patch-1.patch, AVRO-973-patch-2.patch,
> AVRO-973-patch-3.patch, AVRO-973-wrapper.patch, AVRO-973-wrapper.patch,
> test_unions.py
>
> Original Estimate: 0.25h
> Remaining Estimate: 0.25h
>
> Python's union does not respect the order in which type is specified.
> For following schema:
> {"type":"map","values":["int","long","float","double","string","boolean"]},
> an integer value is written as double, but it should respect the order in
> which types have been specified.
> Fixed Code (io.py):
> def write_union(self, writers_schema, datum, encoder):
> """
> A union is encoded by first writing a long value indicating
> the zero-based position within the union of the schema of its value.
> The value is then encoded per the indicated schema within the union.
> """
> # resolve union
> index_of_schema = -1
> for i, candidate_schema in enumerate(writers_schema.schemas):
> if validate(candidate_schema, datum):
> index_of_schema = i
> break // XXX Add break statement here XXX//
> if index_of_schema < 0: raise AvroTypeException(writers_schema, datum)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira