imhy opened a new issue, #9982:
URL: https://github.com/apache/arrow-rs/issues/9982
**Describe the bug**
The Parquet predicate cache could incorrectly cache a leaf inside a
single-leaf optional struct as if it were a flat, non-nested Parquet column.
**To Reproduce**
* Conditions to trigger
1. Schema contains a **group** (struct) field that is `OPTIONAL` …
2. … with **exactly one** required leaf inside (`root_leaf_counts == 1`).
3. The parent struct is actually NULL for at least one row.
4. A `RowFilter` predicate references the inner leaf.
5. The inner leaf is also included in the final projection.
6. Predicate cache is enabled (default; `max_predicate_cache_size > 0`).
Disabling the cache (`builder.with_max_predicate_cache_size(0)`) hides the
bug.
**Expected behavior**
<!--
A clear and concise description of what you expected to happen.
-->
Example schema:
```
message test_schema {
OPTIONAL group address {
REQUIRED BYTE_ARRAY street (UTF8);
}
}
```
Rows:
```
row 0: address = NULL
row 1: address = { street: "Main St" }
```
Query:
```sql
SELECT address
FROM test_schema
WHERE address.street IS NULL;
```
Expected result:
```
row 0: address = NULL
```
Wrong result with predicate cache before the fix:
```
no rows returned
```
**Additional context**
<!--
Add any other context about the problem here.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]