kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616623385


   I have been thinking about place candidates of the interface between the 
native endian and a PARQUET little-endian. 
   
   One of the good candidates is `Serialize()` in `parquet/column_writer.cc`.  
Another candidate is `TypedBufferBuilder` in `arrow/buffer_builder.h`.
   
   Regarding `Serialize()`, this is because there is [a conversion 
loop](https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L1781-1783)
 for Decimal128 that uses BigEndian. For big-endian, `Serialize()` of other 
primitive types including int96 needs to have such as conversion loop to 
little-endian. This is the first step.
   
   While the above approach leads to additional overhead, it would be good to 
have new methods
   `AppendLE` and `UnsafeAppendLE` in `TypedBufferBuilder` in addition to 
[`Append()` and 
`UnsafeAppend`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer_builder.h#L204-L240].
 These new method ensures to write typed data in little-endian.
   
   I think that we can support big-endian in Parquet using a two-step approach. 
What do you think?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to