[
https://issues.apache.org/jira/browse/PARQUET-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218740#comment-15218740
]
Fabrizio Milo commented on PARQUET-574:
---------------------------------------
[~rdblue] so is the length part omitted and is just the length of the whole
remaining dataPage or is the length also written as 4 byte unsigned before the
actual RLE runs ?
I am puzzled by this <length> <encoded data>. See also
https://issues.apache.org/jira/browse/PARQUET-575
Seems the length is present only in the definition levels and if that is the
case the format spec should be updated.
> Boolean format in Plain Decoder
> --------------------------------
>
> Key: PARQUET-574
> URL: https://issues.apache.org/jira/browse/PARQUET-574
> Project: Parquet
> Issue Type: Improvement
> Reporter: Fabrizio Milo
> Priority: Trivial
>
> In the encoding.md document is written that the plain encoder for boolean
> uses
> [RLE/BitPacking](https://github.com/apache/parquet-format/blob/master/Encodings.md#plain-plain--0)
>
> While in the cpp implementation seems is just using [simple bit decoding back
> to
> back.](https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/plain-encoding.h#L151)
> Which one is the right format ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)