alamb commented on code in PR #4818: URL: https://github.com/apache/arrow-rs/pull/4818#discussion_r1326442756
########## arrow-row/src/lib.rs: ########## @@ -232,13 +232,13 @@ mod variable; /// A non-null, non-empty byte array is encoded as `2_u8` followed by the byte array /// encoded using a block based scheme described below. /// -/// The byte array is broken up into 32-byte blocks, each block is written in turn +/// The byte array is broken up into fixed-width blocks, each block is written in turn /// to the output, followed by `0xFF_u8`. The final block is padded to 32-bytes /// with `0_u8` and written to the output, followed by the un-padded length in bytes -/// of this final block as a `u8`. +/// of this final block as a `u8`. The first 4 blocks have a length of 8, with subsequent +/// blocks using a length of 32. Review Comment: I think it would help to explain the rationale for using smaller blocks up front (to avoid space wastage for smaller stings) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
