v0y4g3r opened a new issue, #3506: URL: https://github.com/apache/arrow-rs/issues/3506
**Describe the bug** As per [Parquet's spec](https://parquet.apache.org/docs/file-format/data-pages/encodings/) and [Java implementation](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/statistics/IntStatistics.java#L82), Statistics use plain encoding which encodes INT64 to bytes in litten endian. In arrow's official Parquet implementation, when decoding column statistics, it decodes data in little endian: https://github.com/apache/arrow-rs/blob/3788fd20f053ee58f08b4d09cd4dac5bb9b96c06/parquet/src/file/statistics.rs#L171-L177 But when writing min/max value of statistics, it simply convert the memory representation of i64 values into byte slice, which is platform dependent. https://github.com/apache/arrow-rs/blob/3788fd20f053ee58f08b4d09cd4dac5bb9b96c06/parquet/src/data_type.rs#L451-L463 **To Reproduce** It would be rather easy to reproduce this problem, but I don't have any big endian device like MIPS server by my side. **Expected behavior** Encode min/max value of statistics into little endian bytes. **Additional context** When encoding stats, Parquet uses `AsBytes` trait to convert i64 into byte slice, https://github.com/apache/arrow-rs/blob/3788fd20f053ee58f08b4d09cd4dac5bb9b96c06/parquet/src/data_type.rs#L428-L431 Thus the lifetime of slice returned is bound with the value itself. If we want to convert a i64 into little endian byte slice in a big endian platform, we must create a temporarily array to store the converted little endian bytes of the value instead of just reinterpret the address of value into a byte slice. When `as_bytes` method returns, the temp array will be dropped which violates the trait's lifetime constraint. We may need to change `AsBytes` into sth like: ```rs pub trait AsBytes { fn encode(&self, buf: &mut Vec<u8>) -> usize; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
