jleibs commented on issue #7628:
URL: https://github.com/apache/arrow-rs/issues/7628#issuecomment-2970823649
It's a bit orthogonal to the specific request, but I believe the motivating
factors for @emilk 's request here bring up a deeper confusing issue.
When we talk about attaching metadata to a `RecordBatch` I believe there are
two separate things we could be talking about:
- Message metadata:
-
https://github.com/apache/arrow/blob/2ba455f17e7cdbfe2b2f1aa3dfb2bf00878a17e1/format/Message.fbs#L154
- Schema metadata:
-
https://github.com/apache/arrow/blob/2ba455f17e7cdbfe2b2f1aa3dfb2bf00878a17e1/format/Schema.fbs#L565
The proposed solution reinforces this confusion. I would expect
`RecordBatch::metadata_mut()` to modify the former, but the proposed solution
would in fact modify the latter.
It's not even clear to me if arrow-rs tracks or exposes the latter anywhere.
A corollary to this is that, even after reading the docs, I can't say
whether it's considered an error for several RecordBatches in the same logical
stream to contain different schemas / metadata values.
The existence of `schema()` on interfaces like `RecordBatchReader`, to me,
suggests many implementers would likely expect some guarantee along the lines
of:
```
for record_batch in reader {
assert_eq!(reader.schema(), batch.schema())`;
}
```
Some of the validation questions have been touched on here:
- https://github.com/apache/arrow-rs/pull/4800
- https://github.com/apache/arrow-rs/issues/4801
But those don't seem to bring up the topic of "message metadata" either.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]