Re: [PR] ORC-1700: Write parquet decimal type data in Benchmark using `FIXED_LEN_BYTE_ARRAY` type [orc]

via GitHub Sat, 27 Apr 2024 23:33:55 -0700


cxzl25 commented on PR #1910:
URL: https://github.com/apache/orc/pull/1910#issuecomment-2081354990


   > Should we use INT32 and INT64 for decimals where applicable?
   
   Yes, Spark does this by default. It provides an option 
`spark.sql.parquet.writeLegacyFormat=true` to achieve alignment with Hive 
writing decimal method.
   
   
https://github.com/apache/spark/blob/8b8ea60bd4f22ea5763a77bac2d51f25d2479be9/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala#L328-L339
   
   ```java
       writeLegacyParquetFormat match {
         // Standard mode, 1 <= precision <= 9, writes as INT32
         case false if precision <= Decimal.MAX_INT_DIGITS => int32Writer
   
   
         // Standard mode, 10 <= precision <= 18, writes as INT64
         case false if precision <= Decimal.MAX_LONG_DIGITS => int64Writer
   
   
         // Legacy mode, 1 <= precision <= 18, writes as FIXED_LEN_BYTE_ARRAY
         case true if precision <= Decimal.MAX_LONG_DIGITS => 
binaryWriterUsingUnscaledLong
   
   
         // Either standard or legacy mode, 19 <= precision <= 38, writes as 
FIXED_LEN_BYTE_ARRAY
         case _ => binaryWriterUsingUnscaledBytes
   ```
   
   
https://github.com/apache/hive/blob/4614ce72a7f366674d89a3a78f687e419400cb89/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L568-L578
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] ORC-1700: Write parquet decimal type data in Benchmark using `FIXED_LEN_BYTE_ARRAY` type [orc]

Reply via email to