samuelcolvin opened a new issue, #6310:
URL: https://github.com/apache/arrow-rs/issues/6310
See #6295 — we had an issue with `MetadataLoader::load_page_index` panicing,
with invalid metadata, which I "fixed" (Err instead of panic).
But since the invalid metadata was written by a very recent version of this
crate, I also wanted to work out why invalid metadata was being written in the
first place
The problem (as shown in the `test_invalid_column_index` test in #6295) is
an invalid `ColumnIndex`, specifically the invalid data looked like this:
```rs
[parquet/src/file/page_index/index_reader.rs:167:5] &index = ColumnIndex {
null_pages: [
true,
true,
true,
true,
true,
true,
false,
],
min_values: [
[],
[],
[],
[],
[],
[],
[],
],
max_values: [
[],
[],
[],
[],
[],
[],
[],
],
boundary_order: BoundaryOrder(
1,
),
null_counts: Some(
[
10944,
10240,
10240,
10240,
10240,
10240,
7677,
],
),
repetition_level_histograms: None,
definition_level_histograms: Some(
[
10944,
0,
10240,
0,
10240,
0,
10240,
0,
10240,
0,
10240,
0,
7677,
30,
],
),
}
```
Note that the list item in `null_pages` is `false`, but all values in
`min_values` and `max_values` are empty, that causes the `Err` from:
https://github.com/apache/arrow-rs/blob/ee2f75a66278dbd3e7aa6b85b5322951c792a58d/parquet/src/file/page_index/index.rs#L204-L211
`is_null` is false, so `from_le_slice::<T>(min)` (and `max`) are called, 4
bytes are expected since `T` is `i32`, but the vec is empty.
I've tried in vane to work out where the code its that's writing that data.
cc @adriangb @alamb.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]