[ 
https://issues.apache.org/jira/browse/AVRO-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649103#comment-17649103
 ] 

Ten edited comment on AVRO-3631 at 12/19/22 12:17 AM:
------------------------------------------------------

It looks like this issue essentially comes down to the fact that we assume it's 
always possible to convert a Rust struct with a into an Avro value 
deterministically, but the truth is that structs that get serialized through 
Serde can be serialized into different kinds of Avro values - this applies for 
[u8] and co, for maps that may be serialized as records (serde flatten 
support), for structs that may be serialized as maps, for Union\{null, 
Something} which probably may be serialized as null if the field is missing, 
integers that may also be serialized into timestamps, smaller integers that can 
arguably be upcast...

Overall I think this means that when serializing from Serde framework, we 
should have knowledge of the schema inside the serializer (and recursively 
iterate inside it as we move within the serialization, by putting references to 
it in the intermediate serializer objects), and I don't think it should be 
avoided at all cost. This would give much more flexibility, would be a much 
more flexible mapping, and I can't think of a scenario where the schema 
wouldn't be available at this step - it doesn't make much sense to generate an 
avro ~value from a struct if you have no idea what the schema will be anyway.

 

It looks like to some extent the same applies to deserialization (one may want 
to turn Avro schema into structs based on the actual Avro type) - although I 
can't think of a case apart from constructing `types::Value` itself, and I'm 
not sure what this is even useful for in practice.

 

(This may also be the occasion to move to zero-alloc serialization, by not 
using types::Value as intermediate)


was (Author: JIRAUSER288176):
It looks like this issue essentially comes down to the fact that we assume it's 
always possible to convert a Rust struct with a into an Avro value 
deterministically, but the truth is that structs that get serialized through 
Serde can be serialized into different kinds of Avro values - this applies for 
[u8] and co, but also for Union\{null, Something} which probably may be 
serialized as null if the field is missing, integers that may also be 
serialized into timestamps...

Overall I think this means that when serializing from Serde framework, we 
should have knowledge of the schema inside the deserializer (and recursively 
iterate inside it as we move within the serialization, by putting references to 
it in the intermediate serializer objects), and I don't think it should be 
avoided at all cost. This would give much more flexibility.

 

It looks like to some extent the same applies to deserialization (one may want 
to turn Avro schema into structs based on the actual Avro type) - although I 
can't think of a case apart from constructing `types::Value` itself, and I'm 
not sure what this is even useful for in practice.

 

(This may also be the occasion to move to zero-alloc serialization, by not 
using types::Value as intermediate)

> Fix serialization of structs containing Fixed fields
> ----------------------------------------------------
>
>                 Key: AVRO-3631
>                 URL: https://issues.apache.org/jira/browse/AVRO-3631
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: rust
>            Reporter: Rik Heijdens
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Consider the following minimal Avro Schema:
> {noformat}
> {
>     "type": "record",
>     "name": "TestStructFixedField",
>     "fields": [
>         {
>             "name": "field",
>             "type": {
>                 "name": "field",
>                 "type": "fixed",
>                 "size": 6
>             }
>         }
>     ]
> }
> {noformat}
> In Rust, I might represent this schema with the following struct:
> {noformat}
> #[derive(Debug, Serialize, Deserialize)]
> struct TestStructFixedField {
>     field: [u8; 6]
> }
> {noformat}
> I would then expect to be able to use `apache_avro::to_avro_datum()` to 
> convert an instance of `TestStructFixedField` into an `Vec<u8>` using an 
> instance of `Schema` initialized from the schema listed above.
> However, this fails because the `Value` produced by `apache_avro::to_value()` 
> represents `field` as an `Value::Array<Value::Int>` rather than a 
> `Value::Fixed<6, Vec<u8>` which does not pass schema validation.
> I believe that there are two options to fix this:
> 1. Allow Value::Array<Vec<Value::Int>> to pass validation if the array has 
> the expected length, and none of the contents of the array are out-of-range 
> for u8. If we go down this route, the implementation of `to_avro_datum()` 
> will have to take care of converting Value::Int to u8 when converting into 
> bytes.
> 2. Update `apache_avro::to_value()` such that fixed length arrays are 
> converted into `Value::Fixed<N, Vec<u8>>` rather than `Value::Array`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to