nevi-me commented on a change in pull request #8792:
URL: https://github.com/apache/arrow/pull/8792#discussion_r532205107
##########
File path: rust/parquet/src/arrow/arrow_writer.rs
##########
@@ -423,25 +313,64 @@ fn write_leaf(
Ok(written as i64)
}
-/// A struct that represents definition and repetition levels.
-/// Repetition levels are only populated if the parent or current leaf is
repeated
-#[derive(Debug)]
-struct Levels {
- definition: Vec<i16>,
- repetition: Option<Vec<i16>>,
-}
-
/// Compute nested levels of the Arrow array, recursing into lists and structs
-fn get_levels(
+/// Returns a list of `LevelInfo`, where each level is for nested primitive
arrays.
+///
+/// The algorithm works by eagerly incrementing non-null values, and
decrementing
+/// when a value is null.
+///
+/// *Examples:*
+///
+/// A record batch always starts at a populated definition = level 1.
+/// When a batch only has a primitive, i.e. `<batch<primitive[a]>>, column `a`
+/// can only have a maximum level of 1 if it is not null.
+/// If it is null, we decrement by 1, such that the null slots will = level 0.
+///
+/// If a batch has nested arrays (list, struct, union, etc.), then the
incrementing
+/// takes place.
+/// A `<batch<struct[a]<primitive[b]>>` will have up to 2 levels (if nullable).
+/// When calculating levels for `a`, if the struct slot is not empty, we
+/// increment by 1, such that we'd have `[2, 2, 2]` if all 3 slots are not
null.
+/// If there is an empty slot, we decrement, leaving us with `[2, 0, 2]` as the
+/// null slot effectively means that no record is populated for the row
altogether.
+///
+/// *Lists*
+///
+/// TODO
+///
+/// *Non-nullable arrays*
+///
+/// If an array is non-nullable, this is accounted for when converting the
Arrow
+/// schema to a Parquet schema.
+/// When dealing with `<batch<primitive[_]>>` there is no issue, as the meximum
+/// level will always be = 1.
+///
+/// When dealing with nested types, the logic becomes a bit complicate.
Review comment:
Done
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]