jhorstmann commented on issue #5338: URL: https://github.com/apache/arrow-rs/issues/5338#issuecomment-1913604976
Well, the good news is that configuring parquet-mr to use bitpacking encoding is extremely unsupported :sweat_smile: The only way to do that seems to be classpath-shadowing [`ParquetProperties`](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L148). Changing that method to use `BitPackingValuesWriter` enables bitpacked encoding of definition levels. Otherwise bitpacked seems to be only used for a bitwidth of 0, where the bit-order does not make a difference because all values are zero. The bad news is that the resulting file shows different null values for the java vs the rust implementation Java output: ``` required_string1 optional_string 1.0 1.0 12345678 12345678 true false ``` Arrow-rs `parquet-read` output ``` {required_string: "required_string1", optional_string: null, required_double: 1.0, optional_double: null, required_timestamp: 1970-01-01 03:25:45 +00:00, optional_timestamp: null, required_bool: true, optional_bool: null} ``` [bitpacking_levels.parquet.zip](https://github.com/apache/arrow-rs/files/14076237/bitpacking_levels.parquet.zip) [Java code that wrote this file](https://github.com/jhorstmann/parquet-writer-example/blob/use-bitpacked-levels/src/main/java/net/jhorstmann/parquetwriterexample/Main.java) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
