alamb commented on issue #7482:
URL:
https://github.com/apache/arrow-datafusion/issues/7482#issuecomment-1707105827
BTW I found another workaround in IOx. We had configured a `FileScanConfig`
like this (with `output_ordering: vec![vec![]]`) :
```rust
let base_config = FileScanConfig {
object_store_url: self.object_store_url.clone(),
file_schema: schema,
file_groups: vec![vec![PartitionedFile {
object_meta: self.object_meta.clone(),
partition_values: vec![],
range: None,
extensions: None,
}]],
statistics: Statistics::default(),
projection: None,
limit: None,
table_partition_cols: vec![],
// Parquet files ARE actually sorted but we don't care here
since we just construct a `collect` plan.
output_ordering: vec![],
infinite_source: false,
};
```
I could stop the crashes like this:
```diff
diff --git a/parquet_file/src/storage.rs b/parquet_file/src/storage.rs
index 285e272f7..c520e3bd0 100644
--- a/parquet_file/src/storage.rs
+++ b/parquet_file/src/storage.rs
@@ -137,7 +137,7 @@ impl ParquetExecInput {
limit: None,
table_partition_cols: vec![],
// Parquet files ARE actually sorted but we don't care here
since we just construct a `collect` plan.
- output_ordering: vec![vec![]],
+ output_ordering: vec![],
infinite_source: false,
};
let exec = ParquetExec::new(base_config, None, None);
```
I do think this illustrates why having a dedicated structure to encapsulate
the output orderings might be nice. But definitely not necessary
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]