[
https://issues.apache.org/jira/browse/PARQUET-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731240#comment-15731240
]
Li commented on PARQUET-792:
----------------------------
By printing the meta with parquet-cli, I found the field with all-null value
took roughly 0.2 Byte for one record. But there were thousands of such fields,
so it wastes more space than payload.
> Skip the storage of repetition level and definition level for all-null column
> -----------------------------------------------------------------------------
>
> Key: PARQUET-792
> URL: https://issues.apache.org/jira/browse/PARQUET-792
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Li
> Priority: Minor
>
> I have a very sparse protobuf message in my project, with thousands of fields.
> In practise, most of the fields are all null values in one page.
> But the repetition level and definition level takes lots of storage space.
> Can parquet skip the storage of r level and d level for such all-null columns
> to save storage space?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)