kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616623385
I have been thinking about place candidates of the interface between the native endian and a PARQUET little-endian. One of the good candidates is `Serialize()` in `parquet/column_writer.cc`. Another candidate is `TypedBufferBuilder` in `arrow/buffer_builder.h`. Regarding `Serialize()`, this is because there is [a conversion loop](https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L1781-1783) for Decimal128 that uses BigEndian. For big-endian, `Serialize()` of other primitive types including int96 needs to have such as conversion loop to little-endian. This is the first step. While the above approach leads to additional overhead, it would be good to have new methods `AppendLE` and `UnsafeAppendLE` in `TypedBufferBuilder` in addition to [`Append()` and `UnsafeAppend`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer_builder.h#L204-L240]. These new method ensures to write typed data in little-endian. I think that we can support big-endian in Parquet using a two-step approach. What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org