jonded94 commented on code in PR #8790:
URL: https://github.com/apache/arrow-rs/pull/8790#discussion_r2582588902
##########
arrow-array/src/ffi_stream.rs:
##########
@@ -364,7 +364,13 @@ impl Iterator for ArrowArrayStreamReader {
let result = unsafe {
from_ffi_and_data_type(array,
DataType::Struct(self.schema().fields().clone()))
};
- Some(result.map(|data| RecordBatch::from(StructArray::from(data))))
+ Some(result.map(|data| {
+ let struct_array = StructArray::from(data);
Review Comment:
>> What is current on main is pretty elegant
>
> `RecordBatch::from(StructArray::from(data)))`
I personally wouldn't call this elegant, as it's actually *currently*
leading to unsoundness problems which I already stated a few times in this
discussion :grimacing:
1. Since `StructArray` is metadata-less, the `RecordBatch` constructed from
it is metadata-less by definition too. This means, the returned `RecordBatch`
can't ever have the schema advertised by `ArrowArrayStreamReader`, at least if
it's supposed to have metadata.
2. Much more importantly: There currently are no checks whether the
`RecordBatch` constructed from `StructArray` and returned by the stream reader
even corresponds to the schema which `ArrowArrayStreamReader` advertises. They
are just returned as-is, and in prinicple it's possible to make this interface
return an entirely different `RecordBatch`. This was the cause of this PR,
namely that it's currently returning unsound `RecordBatch` without advertising
this as an error somewhere.
The current state with `unsafe` in this PR more or less leaves problem 2
intact (with arguably introducing a chance of an additional invariant
incorrectness in the `RecordBatch` itself), but at least fixes problem 1.
Nevertheless, I will proceed with introducing schema checking now and remove
`unsafe`, which would also fix problem 2 in general, with a tiny additional
compute cost.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]