[ 
https://issues.apache.org/jira/browse/PARQUET-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218330#comment-15218330
 ] 

Ryan Blue commented on PARQUET-574:
-----------------------------------

Java uses [RLE for 
boolean|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L210].
 Bit packing is supported for reads, but deprecated and Parquet MR no longer 
uses it when writing 2.0 encodings. I believe we should be able to read 
bit-packed booleans just fine, but using RLE will be much smaller if there are 
runs in the data, which is fairly common.

> Boolean format in Plain Decoder 
> --------------------------------
>
>                 Key: PARQUET-574
>                 URL: https://issues.apache.org/jira/browse/PARQUET-574
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Fabrizio Milo
>            Priority: Trivial
>
> In the encoding.md document is written that the plain encoder for boolean 
> uses 
> [RLE/BitPacking](https://github.com/apache/parquet-format/blob/master/Encodings.md#plain-plain--0)
>  
> While in the cpp implementation seems is just using [simple bit decoding back 
> to 
> back.](https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/plain-encoding.h#L151)
> Which one is the right format ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to