alamb commented on PR #4800: URL: https://github.com/apache/arrow-rs/pull/4800#issuecomment-1711981166
> If the metadata is inconsistent how does it know which metadata to preserve? Right now the `Schema` of the output RecordBatch is the schema that was provided by the caller as the first argument to `concat_batches` To summarize a conversation I had with @tustvold over slack 1. I believe his core concern with this PR is that making this check more lax means that it is likely papering over what some people might perceive as a bug in the caller (in this case, inconsistent metadata) 2. An alternative interpretation might be that by checking for exactly the same schema for all input batches, the `concat_batches` kernel is imposing a particular definition of schema equality and enforcing an invariant that might not be what other systems have in mind. From this point of view, removing the `Schema` equality check entirely might be appropriate I am sure I can fix my particular problem (see https://github.com/influxdata/influxdb_iox/pull/8691/files#r1319044861) other level of the stack (e.g in DataFusion) but it didn't feel right to me that `concat_batches` was enforcing some particular invariant that is not enforced elsewhere -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
