msalib opened a new issue, #4023: URL: https://github.com/apache/arrow-rs/issues/4023
**Describe the bug** Let's say you're trying to async read a Parquet file on S3, and that file has metadata (like "created by"). There's an inconsistency: `ParquetRecordBatchStream::schema` will produce a `Schema` object that includes that metadata. But `ParquetRecordBatchStream` will yield `RecordBatch`es that have schema objects that don't have the metadata. The problem is that if you create an `ArrowWriter` using the first schema and then try to write batches from the stream to it, the schemas won't match (the writer is expecting metadata but each batch has a schema without metadata). **Expected behavior** I'd expect that either: * `ParquetRecordBatchStream::schema` produces a `Schema` without metadata, or * the `RecordBatch`es produced by `ParquetRecordBatchStream` have the exact same schema as what `::schema` returns, or * `ArrowWriter` should tolerate its supplied schema differing from the batch schemas provided to `write()` in metadata -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
