carols10cents opened a new issue, #5342:
URL: https://github.com/apache/arrow-rs/issues/5342
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
I have a bunch of record batches, and I'd like to unify their schemas. I
would like to create one new `Schema` instance, which I will have ownership of,
of course, but it seems like the schema merge should only need to borrow the
record batches' schemas.
**Describe the solution you'd like**
I wish I could write:
```
use arrow::{datatypes::Schema, error::ArrowError, record_batch::RecordBatch};
fn schema_merge(batches: &[RecordBatch]) -> Result<Schema, ArrowError> {
Schema::try_merge(batches.iter().map(|b| b.schema()))
}
```
but that doesn't compile because `batch.schema()` returns an owned
`Arc<Schema>` and `try_merge` expects `impl IntoIterator<Item = Schema>`:
```
425 | Schema::try_merge(batches.iter().map(|b| b.schema()))
| ----------------- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected
`Schema`, found `Arc<Schema>`
| |
| required by a bound introduced by this call
|
= note: expected struct `arrow::datatypes::Schema`
found struct `std::sync::Arc<arrow::datatypes::Schema>`
```
It seems like `Schema::try_merge` could be changed to only need references
to the schemas (or a new function if you didn't want to break backwards
compatibility):
```diff
- pub fn try_merge(schemas: impl IntoIterator<Item = Self>) ->
Result<Self, ArrowError> {
+ pub fn try_merge<'a>(
+ schemas: impl IntoIterator<Item = &'a Schema>,
+ ) -> Result<Schema, ArrowError> {
...
// merge metadata
for (key, value) in metadata.into_iter() {
- if let Some(old_val) = out_meta.get(&key) {
- if old_val != &value {
+ if let Some(old_val) = out_meta.get(key) {
+ if old_val != value {
...
- out_meta.insert(key, value);
+ out_meta.insert(key.clone(), value.clone());
}
}
```
buuuuuut then you'd have this problem if you try to use `as_ref` to get
`&Schema` from the `Arc<Schema>`s:
```
error[E0515]: cannot return value referencing temporary value
--> bulk_ingester/src/fsm/create_intermediate_files.rs:425:40
|
425 | my_try_merge(batches.iter().map(|b| b.schema().as_ref()))
| ----------^^^^^^^^^
| |
| returns a value referencing
data owned by the current function
| temporary value created here
```
so it seems like if there's a `Schema::try_merge_from_borrowed_schemas`,
then it'd also be nice to have `RecordBatch::borrow_schema` that returns
`&Schema` (not to be confused with `SchemaRef`, of course ;))
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]