HippoBaro commented on code in PR #9848:
URL: https://github.com/apache/arrow-rs/pull/9848#discussion_r3494355469
##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -62,6 +62,19 @@ pub struct GenericRecordReader<V, CV> {
num_records: usize,
/// Capacity hint for pre-allocating buffers based on batch size
capacity_hint: usize,
+ /// Number of values in the values buffer (may differ from num_values when
+ /// padding_threshold is set, since list-level padding is excluded).
+ values_written: usize,
+ /// When set, `pad_nulls` only pads item-level nulls (def >= threshold)
Review Comment:
Agreed. I added
```rust
/// Definition-level threshold used for selective null padding.
///
/// With full padding (`None`), the leaf values buffer has one slot for
each
/// decoded definition level. This includes placeholders for null or
empty
/// parent lists, which parent `ListArrayReader`s later have to filter
out
/// before computing offsets.
///
/// With selective padding (`Some(threshold)`), the threshold is the
nearest
/// enclosing list/map definition level. Entries with `def < threshold`
/// describe a null/empty parent and are skipped entirely. Entries with
/// `def >= threshold` belong to an actual child item slot: real values
are
/// copied, and item-level nulls are padded. The companion
`compact_bitmap`
/// has the same compact length and becomes the leaf null bitmap.
padding_threshold: Option<i16>,
```
which hopefully helps the reader follow along. Let me know if you'd like to
change that.
Ref:
https://github.com/HippoBaro/arrow-rs/blob/6598016d3ce76145594d913c4de468e28b9587a6/parquet/src/arrow/record_reader/mod.rs#L68-L81
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]