alamb commented on code in PR #5053: URL: https://github.com/apache/arrow-rs/pull/5053#discussion_r1385404443
########## parquet/README.md: ########## @@ -55,7 +55,7 @@ The `parquet` crate provides the following features which may be enabled in your ## Parquet Feature Status -- [x] All encodings supported +- [x] All encodings except BYTE_STREAM_SPLIT supported Review Comment: Maybe we can leave a link to https://github.com/apache/arrow-rs/issues/4102 to see if we can find someone to help ########## parquet/src/basic.rs: ########## @@ -304,6 +316,18 @@ impl FromStr for Encoding { // Mirrors `parquet::CompressionCodec` /// Supported compression algorithms. Review Comment: ```suggestion /// Supported block compression algorithms. Defaults to ``Self::UNCOMPRESSED`` ``` If we are changing the docs anyway, maybe we can also call out that the default compression is `NONE` as I at least have been surprised by that in the past ########## parquet/src/basic.rs: ########## @@ -215,8 +215,20 @@ pub enum Repetition { // Mirrors `parquet::Encoding` /// Encodings supported by Parquet. +/// /// Not all encodings are valid for all types. These enums are also used to specify the /// encoding of definition and repetition levels. +/// +/// Additionally, not all encodings are well supported by the broader ecosystem. Review Comment: ```suggestion /// Additionally, not all encodings are well supported by the broader Parquet ecosystem, such as parquet-mr. ``` Should we call out parquet-mr? In general it would be great to add some specificity if possible (aka what does "broad" mean, can we give examples of widely used systems that only practically support `PLAIN` / `RLE` / `RLE_DICTIONARY`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
