samuelcolvin commented on issue #6310:
URL: https://github.com/apache/arrow-rs/issues/6310#issuecomment-2312724126
> my guess is that the problem comes because either num_buffered_rows or
num_page_nulls is wrong in the last page, hence
Okay, ignore that suggestion. I've done some more digging and have a bit of
progress, the key point from above is
```
definition_level_histograms: Some(
[
...
7677,
30,
],
```
I think this is saying that the last page has 7677 null values (which
matches `null_counts`), and 30 non-null values.
Sure enough, if I run `select count(*) from 'bad.parquet' where process_pid
is not null;` on the parquet file (`process_pid` is the problematic column), I
get the result `30`! All 30 non-null values are `1`.
I guess the next step is to build a parquet file with a `UInt32` column
that's mostly null except one page, and see if we can reproduce the problem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]