crepererum commented on code in PR #3251:
URL: https://github.com/apache/arrow-rs/pull/3251#discussion_r1037039354


##########
arrow/src/row/mod.rs:
##########
@@ -343,6 +344,54 @@ mod variable;
 /// └───────┴───────────────┴───────┴─────────┴───────┘
 /// ```
 ///
+/// ## List Encoding
+///
+/// Lists are encoded by first encoding all child elements to the row format.
+///
+/// A "canonical byte array" is then constructed by concatenating the row
+/// encodings of all their elements into a single binary array, followed
+/// by the lengths of each encoded row, and the number of elements, encoded
+/// as big endian `u32`.
+///
+/// This canonical byte array is then encoded using the variable length byte
+/// encoding described above.
+///
+/// _The lengths are not strictly necessary but greatly simplify decode, they
+/// may be removed in a future iteration_.
+///
+/// For example given:
+///
+/// ```text
+/// [1_u8, 2_u8, 3_u8]
+/// [1_u8, null]
+/// []
+/// null
+/// ```
+///
+/// The elements would be converted to:
+///
+/// ```text
+///     ┌──┬──┐     ┌──┬──┐     ┌──┬──┐     ┌──┬──┐        ┌──┬──┐
+///  1  │01│01│  2  │01│02│  3  │01│02│  1  │01│01│  null  │00│00│

Review Comment:
   ```suggestion
   ///  1  │01│01│  2  │01│02│  3  │01│03│  1  │01│01│  null  │00│00│
   ```



##########
arrow/src/row/mod.rs:
##########
@@ -343,6 +344,54 @@ mod variable;
 /// └───────┴───────────────┴───────┴─────────┴───────┘
 /// ```
 ///
+/// ## List Encoding
+///
+/// Lists are encoded by first encoding all child elements to the row format.
+///
+/// A "canonical byte array" is then constructed by concatenating the row
+/// encodings of all their elements into a single binary array, followed
+/// by the lengths of each encoded row, and the number of elements, encoded
+/// as big endian `u32`.
+///
+/// This canonical byte array is then encoded using the variable length byte
+/// encoding described above.
+///
+/// _The lengths are not strictly necessary but greatly simplify decode, they
+/// may be removed in a future iteration_.
+///
+/// For example given:
+///
+/// ```text
+/// [1_u8, 2_u8, 3_u8]
+/// [1_u8, null]
+/// []
+/// null
+/// ```
+///
+/// The elements would be converted to:
+///
+/// ```text
+///     ┌──┬──┐     ┌──┬──┐     ┌──┬──┐     ┌──┬──┐        ┌──┬──┐
+///  1  │01│01│  2  │01│02│  3  │01│02│  1  │01│01│  null  │00│00│
+///     └──┴──┘     └──┴──┘     └──┴──┘     └──┴──┘        └──┴──┘
+///```
+///
+/// Which would be grouped into the following canonical byte arrays:
+///
+/// ```text
+///                         
┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
+///  [1_u8, 2_u8, 3_u8]     
│01│01│01│02│01│03│00│00│00│02│00│00│00│02│00│00│00│02│00│00│00│03│

Review Comment:
   I get the `│01│01│01│02│01│03│` prefix and the `│00│00│00│03│` suffix. But 
where do the other bytes come from?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to