jecsand838 commented on PR #8930: URL: https://github.com/apache/arrow-rs/pull/8930#issuecomment-3775597328
@mzabaluev > > I don't think this is a bug in the async reader. You are using a testing infrastructure build around Arrow schemas which have the reader schema in the metadata, but you did not provide the schema in yours. > > My test provides the Arrow reader schema and the top-level Avro record name in the metadata, which should be sufficient. The problem was in a schema mismatch: in the file, the array elements are not nullable. I think this is a schema resolution bug based on a quick glance over details you provided. That being said there are limitations with using `AvroSchema::try_from` to create a reader schema. For now my recommendation for creating a reader schema (especially more complicated ones) is to either: 1. Modify the writer schema's JSON 2. Manually craft the json for an `AvroSchema`. 3. Use `AvroSchema::try_from`, but sanitize the output and embed it into a pre-defined JSON wrapper if needed. Originally the `AvroSchema::try_from` method was built for the Writer so that a correct `AvroSchema` is inferred from an Arrow `Schema` in the absence of a provided `AvroSchema`. The biggest challenge to overcome relates to the lossy behavior inherent to Arrow -> Avro schema conversion, i.e. Arrow not having the concepts of named types, etc. @EmilyMatt > The issue is probably in the AvroSchema::from it has various bugs I've also encountered. 100%, It's absolutely not related to this PR. Sorry about not jumping in sooner to call that out. --- As an aside, I just created #9233 which proposes an approach for modularizing `schema.rs`, adding an `ArrowToAvroSchemaBuilder`, and enhancing the overall `AvroSchema` conversion functionality. I'd love to get some feedback if either of you get an opportunity! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
