Re: Schema Resolution for Enums

graham sanderson Thu, 26 Sep 2013 07:31:28 -0700

Ah, I didn't realize you were writing an implementation

if both are enums:
if the writer's symbol is not present in the reader's enum, then an error is 
signalled.


True, it doesn't explicitly say that you should (do the mapping) to read the 
correct value, but I would imagine that is implied. That said, I only have 
actual experience with the Java implementation which does indeed do the mapping 
when doing schema resolution.

On Sep 26, 2013, at 8:29 AM, Youssef Hatem <[email protected]> wrote:

> I am working with a team on implementing Avro specification. I found this 
> underspecification while writing unit tests.
> 
> All I can say is that we use the deserialized enum value as it is. So we get 
> zero, which is wrong since zero corresponds to something else in the reader 
> enum.
> 
> "you should get "SPADES" back when you read assuming Avro knows about both 
> the writer and the reader schema (and thus can correctly interpret "zero')"
> This means we should implement some kind of mapping between reader/writer 
> schema for enum values such that when we deserialize 0 we map it to 1 in this 
> case. Otherwise we will probably get wrong values, am I correct? Thanks in 
> advance.
> 
> On Sep 26, 2013, at 15:11 , graham sanderson wrote:
> 
>> "According to writer schema a zero will be encoded. However when 
>> deserialized using the aforementioned reader schema the datum will 
>> correspond to HEARTS since the order of symbols has changed in the reader 
>> schema and HEARTS is the first."
>> 
>> Have you observed this problem - do you have a code snippet for how you are 
>> reading? The schemas are indeed compatible despite the re-ordering, and you 
>> should get "SPADES" back when you read assuming Avro knows about both the 
>> writer and the reader schema (and thus can correctly interpret "zero')
>> 
>> On Sep 26, 2013, at 7:57 AM, Youssef Hatem <[email protected]> 
>> wrote:
>> 
>>> Hello,
>>> 
>>> According to Avro Standard 1.7.5 enums match when names match and all 
>>> writer symbols exist in reader schema.
>>> 
>>> According to this definition the following writer schema:
>>> { "type": "enum", "name":"Suit","symbols":["SPADES", "HEARTS", "DIAMONDS", 
>>> "CLUBS"]}
>>> matches this reader schema:
>>> { "type": "enum", "name":"Suit","symbols":["HEARTS", "SPADES", "DIAMONDS", 
>>> "CLUBS"]}
>>> 
>>> However this can lead to semantical problems. Assume I have the following 
>>> datum:
>>> Suit suit = SPADES
>>> 
>>> According to writer schema a zero will be encoded. However when 
>>> deserialized using the aforementioned reader schema the datum will 
>>> correspond to HEARTS since the order of symbols has changed in the reader 
>>> schema and HEARTS is the first.
>>> 
>>> My question is, why the standard doesn't explicitly add another constraint 
>>> for matching enums like the order of elements must be preserved, or is it 
>>> implied?
>>> 
>>> Best regards,
>>> Youssef Hatem
>> 
>

smime.p7s
Description: S/MIME cryptographic signature

Re: Schema Resolution for Enums

Reply via email to