[ 
https://issues.apache.org/jira/browse/AVRO-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649103#comment-17649103
 ] 

Ten edited comment on AVRO-3631 at 12/18/22 11:07 PM:
------------------------------------------------------

It looks like this issue essentially comes down to the fact that we assume it's 
always possible to convert a Rust struct with a into an Avro value 
deterministically, but the truth is that structs that get serialized through 
Serde can be serialized into different kinds of Avro values - this applies for 
[u8] and co, but also for Union\{null, Something} which probably may be 
serialized as null if the field is missing, integers that may also be 
serialized into timestamps...

Overall I think this means that when serializing from Serde framework, we 
should have knowledge of the schema inside the deserializer (and recursively 
iterate inside it as we move within the serialization, by putting references to 
it in the intermediate serializer objects), and I don't think it should be 
avoided at all cost. This would give much more flexibility.

 

It looks like to some extent the same applies to deserialization (one may want 
to turn Avro schema into structs based on the actual Avro type) - although I 
can't think of a case apart from constructing `types::Value` itself, and I'm 
not sure what this is even useful for in practice.

 

(This may also be the occasion to move to zero-alloc serialization, by not 
using types::Value as intermediate)


was (Author: JIRAUSER288176):
It looks like this issue essentially comes down to the fact that we assume it's 
always possible to convert a Rust struct with a into an Avro value 
deterministically, but the truth is that structs that get serialized through 
Serde can be serialized into different kinds of Avro values - this applies for 
[u8] and co, but also for Union\{null, Something} which probably may be 
serialized as null if the field is missing, integers that may also be 
serialized into timestamps...

Overall I think this means that when serializing from Serde framework, we 
should have knowledge of the schema inside the deserializer, and I don't think 
it should be avoided at all cost. This would give much more flexibility.

 

It looks like to some extent the same applies to deserialization (one may want 
to turn Avro schema into structs based on the actual Avro type) - although I 
can't think of a case apart from constructing `types::Value` itself, and I'm 
not sure what this is even useful for in practice.

 

(This may also be the occasion to move to zero-alloc serialization, by not 
using types::Value as intermediate)

> Fix serialization of structs containing Fixed fields
> ----------------------------------------------------
>
>                 Key: AVRO-3631
>                 URL: https://issues.apache.org/jira/browse/AVRO-3631
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: rust
>            Reporter: Rik Heijdens
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Consider the following minimal Avro Schema:
> {noformat}
> {
>     "type": "record",
>     "name": "TestStructFixedField",
>     "fields": [
>         {
>             "name": "field",
>             "type": {
>                 "name": "field",
>                 "type": "fixed",
>                 "size": 6
>             }
>         }
>     ]
> }
> {noformat}
> In Rust, I might represent this schema with the following struct:
> {noformat}
> #[derive(Debug, Serialize, Deserialize)]
> struct TestStructFixedField {
>     field: [u8; 6]
> }
> {noformat}
> I would then expect to be able to use `apache_avro::to_avro_datum()` to 
> convert an instance of `TestStructFixedField` into an `Vec<u8>` using an 
> instance of `Schema` initialized from the schema listed above.
> However, this fails because the `Value` produced by `apache_avro::to_value()` 
> represents `field` as an `Value::Array<Value::Int>` rather than a 
> `Value::Fixed<6, Vec<u8>` which does not pass schema validation.
> I believe that there are two options to fix this:
> 1. Allow Value::Array<Vec<Value::Int>> to pass validation if the array has 
> the expected length, and none of the contents of the array are out-of-range 
> for u8. If we go down this route, the implementation of `to_avro_datum()` 
> will have to take care of converting Value::Int to u8 when converting into 
> bytes.
> 2. Update `apache_avro::to_value()` such that fixed length arrays are 
> converted into `Value::Fixed<N, Vec<u8>>` rather than `Value::Array`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to