nevi-me commented on issue #252: URL: https://github.com/apache/arrow-rs/issues/252#issuecomment-836925868
> Perhaps we could make the simplifying assumption and say "the arrow schema is supposed to be the same for all record, and thus we assume the metadata that applies to all the rows should be the same as well"? If I think about how the IPC format works, we send the schema first, and then send batches after. The batches don't have a copy of the schema, but would have just the buffers making up the data. So, my thinking is that we: * Write the schema of the Arrow data to `FileMetaData` * Write the schema of each field to `ColumnMetaData` * Use the schema that's provided in the write function, and not the ones from each `ArrowWriter::write(batch: &RecordBatch)`. I can't think of a valid use-case where we expect a stream of Arrow data's metadata (at a schema or field) to change mid-stream. I don't think we'd even be able to communicate such a scenario with `arrow-flight`. I wonder though, if Parquet ordinarily handles a scenario where the metadata per file is different 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
