jecsand838 commented on PR #8930:
URL: https://github.com/apache/arrow-rs/pull/8930#issuecomment-3775597328

   @mzabaluev 
   
   > > I don't think this is a bug in the async reader. You are using a testing 
infrastructure build around Arrow schemas which have the reader schema in the 
metadata, but you did not provide the schema in yours.
   > 
   > My test provides the Arrow reader schema and the top-level Avro record 
name in the metadata, which should be sufficient. The problem was in a schema 
mismatch: in the file, the array elements are not nullable.
   
   I think this is a schema resolution bug based on a quick glance over details 
you provided. 
   
   That being said there are limitations with using `AvroSchema::try_from` to 
create a reader schema. For now my recommendation for creating a reader schema 
(especially more complicated ones) is to either:
   1. Modify the writer schema's JSON
   2. Manually craft the json for an `AvroSchema`. 
   3. Use `AvroSchema::try_from`, but sanitize the output and embed it into a 
pre-defined JSON wrapper if needed.
   Originally the `AvroSchema::try_from` method was built for the Writer so 
that a correct `AvroSchema` is inferred from an Arrow `Schema` in the absence 
of a provided `AvroSchema`.
   
   The biggest challenge to overcome relates to the lossy behavior inherent to 
Arrow -> Avro schema conversion, i.e. Arrow not having the concepts of named 
types, etc. 
   
   @EmilyMatt 
   
   > The issue is probably in the AvroSchema::from
   it has various bugs I've also encountered.
   
   100%, It's absolutely not related to this PR. Sorry about not jumping in 
sooner to call that out. 
   
   ---
   
   As an aside, I just created #9233  which proposes an approach for 
modularizing `schema.rs`, adding an `ArrowToAvroSchemaBuilder`, and enhancing 
the overall `AvroSchema` conversion functionality. I'd love to get some 
feedback if either of you get an opportunity!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to