jhorstmann commented on issue #5338:
URL: https://github.com/apache/arrow-rs/issues/5338#issuecomment-1913604976

   Well, the good news is that configuring parquet-mr to use bitpacking 
encoding is extremely unsupported :sweat_smile: 
   
   The only way to do that seems to be classpath-shadowing 
[`ParquetProperties`](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.1/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L148).
 Changing that method to use `BitPackingValuesWriter` enables bitpacked 
encoding of definition levels. Otherwise bitpacked seems to be only used for a 
bitwidth of 0, where the bit-order does not make a difference because all 
values are zero.
   
   The bad news is that the resulting file shows different null values for the 
java vs the rust implementation
   
   Java output:
   
   ```
   required_string1
   optional_string
   1.0
   1.0
   12345678
   12345678
   true
   false
   ```
   
   Arrow-rs `parquet-read` output
   
   ```
   {required_string: "required_string1", optional_string: null, 
required_double: 1.0, optional_double: null, required_timestamp: 1970-01-01 
03:25:45 +00:00, optional_timestamp: null, required_bool: true, optional_bool: 
null}
   ```
   
   
[bitpacking_levels.parquet.zip](https://github.com/apache/arrow-rs/files/14076237/bitpacking_levels.parquet.zip)
   
   [Java code that wrote this 
file](https://github.com/jhorstmann/parquet-writer-example/blob/use-bitpacked-levels/src/main/java/net/jhorstmann/parquetwriterexample/Main.java)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to