tustvold commented on PR #6855: URL: https://github.com/apache/arrow-rs/pull/6855#issuecomment-2528255451
> This is DataFusion limitation rather than an Arrow limitation, but DataFusion uses Arrow's RecordBatch. > It would be nice eventually if DataFusion would just require the logical schema to be the same for all batches but allow differences in the physical type. I think this is key issue, the schema of the RecordBatch is the physical type. Arrow has no notion of a logical type, nor realistically can it when what this looks like is so use-case specific, are Int32 and Int64 the same logical type, what about differing decimal precisions? Ultimately as the schema cannot vary within a single RecordBatch, the onus is on whatever is the origin of the inter-RecordBatch constraint to make a judgement on whether they accept heterogenous inputs. This PR is effectively breaking a fairly fundamental invariant of RecordBatch to bypass checks in other components that are either necessary because the component relies on them, or unnecessary and therefore could/should just be removed. Or to phrase it differently, I can't see what correct usage there could be of this API that isn't just working around an over-zealous constraint in some unrelated system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
