LuciferYang opened a new pull request, #3446:
URL: https://github.com/apache/parquet-java/pull/3446

   ### Rationale for this change
   Fixes #3307
   
   When repetition/definition levels are empty (i.e., `maxLevel == 0` for 
required, non-repeated fields), `DevNullValuesWriter` is used as a no-op writer 
that produces zero bytes. However, its `getEncoding()` method returns the 
deprecated `BIT_PACKED` encoding, which gets written into the `DataPageHeader` 
metadata (`repetition_level_encoding` / `definition_level_encoding`) and the 
column chunk encoding list.
   
   Since parquet-java already uses `RLE` as the encoding for levels, the 
metadata should reflect `RLE` rather than the deprecated `BIT_PACKED`. 
   
   ### What changes are included in this PR?
   Changed `DevNullValuesWriter.getEncoding()` to return `Encoding.RLE` instead 
of `Encoding.BIT_PACKED`.
   
   
   ### Are these changes tested?
    Yes. All existing tests in `parquet-column` and `parquet-hadoop` pass 
without modification.
   
   ### Are there any user-facing changes?
   Newly written Parquet files will report `RLE` instead of `BIT_PACKED` in 
page header metadata for empty repetition/definition levels. This has no impact 
on file compatibility — readers do not decode level data when the byte length 
is zero, regardless of the encoding value in the header.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to