mapleFU commented on code in PR #10101: URL: https://github.com/apache/arrow-rs/pull/10101#discussion_r3395317541
########## parquet/src/arrow/array_reader/mod.rs: ########## @@ -85,6 +85,66 @@ pub use struct_array::StructArrayReader; /// /// Data can either be read in batches using [`ArrayReader::next_batch`] or /// incrementally using [`ArrayReader::read_records`] and [`ArrayReader::skip_records`]. +/// +/// # Definition and repetition levels +/// +/// Parquet encodes nesting, nulls, and empty lists using *definition* and +/// *repetition* levels, based on the [Dremel paper]. Some example nested +/// readers are: +/// * [`ListArrayReader`] +/// * [`FixedSizeListArrayReader`] +/// * [`MapArrayReader`] +/// * [`StructArrayReader`] +/// +/// Each nested reader accesses the levels via [`ArrayReader::get_def_levels`] +/// and [`ArrayReader::get_rep_levels`] and uses them to reconstruct nulls, +/// empty lists, and list boundaries. +/// +/// Each nested reader is built with a definition level `D` and a repetition +/// level `R` taken from its [`ParquetField`] (see its `def_level` / `rep_level` +/// fields). Given a child's level pair `(d, r)`, the two levels are interpreted +/// as follows. +/// +/// **Definition level** — how "present" the value is at this level: +/// +/// ```text +/// ┌───────────────────────────┬────────────────────────────────────┐ +/// │ State │ def level (d) │ +/// ├───────────────────────────┼────────────────────────────────────┤ +/// │ present, with a value │ d >= D │ +/// │ present but empty (list) │ d == D - 1 │ +/// │ null │ d <= D - 2 ← "lower still" │ Review Comment: Oh I misread this :-( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
