shangxinli commented on PR #14435:
URL: https://github.com/apache/iceberg/pull/14435#issuecomment-3678347121

   > I have seen this: #14853 - Maybe we can't use RLE? What is the normal 
Parquet writer using?
   
     We should use RLE because it's the standard encoding for 
definition/repetition levels in Apache Parquet.
   
   
     - In the [parquet 
document](https://parquet.apache.org/docs/file-format/data-pages/encodings),  
BIT_PACKED is "deprecated and will be replaced by the RLE/bit-packing hybrid 
encoding"
     - In parquet 
[code](https://github.com/apache/parquet-java/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java:),
 we create RunLengthBitPackingHybridValuesWriter for def/rep levels, in which 
[code](https://github.com/apache/parquet-java/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/rle/RunLengthBitPackingHybridValuesWriter.java)
 getEncoding() returns Encoding.RLE
   
     Our code (ParquetFileMerger.java:537-541):
     - Line 537-538: Encoding.RLE for definition/repetition levels 
     - Line 539: Encoding.DELTA_BINARY_PACKED for data values
   
     PR #14853 is about reading RLE-encoded data pages, not def/rep levels. 
Since we use DELTA_BINARY_PACKED for data (line 539), that PR doesn't affect us.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to