[ 
https://issues.apache.org/jira/browse/PARQUET-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218740#comment-15218740
 ] 

Fabrizio Milo commented on PARQUET-574:
---------------------------------------

[~rdblue] so is the length part omitted and is just the length of the whole 
remaining dataPage or is the length also written as 4 byte unsigned before the 
actual RLE runs ?

I am puzzled by this <length> <encoded data>. See also 
https://issues.apache.org/jira/browse/PARQUET-575 

Seems the length is present only in the definition levels and if that is the 
case the format spec should be updated. 

> Boolean format in Plain Decoder 
> --------------------------------
>
>                 Key: PARQUET-574
>                 URL: https://issues.apache.org/jira/browse/PARQUET-574
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Fabrizio Milo
>            Priority: Trivial
>
> In the encoding.md document is written that the plain encoder for boolean 
> uses 
> [RLE/BitPacking](https://github.com/apache/parquet-format/blob/master/Encodings.md#plain-plain--0)
>  
> While in the cpp implementation seems is just using [simple bit decoding back 
> to 
> back.](https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/plain-encoding.h#L151)
> Which one is the right format ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to