[
https://issues.apache.org/jira/browse/PARQUET-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732903#comment-15732903
]
Uwe L. Korn commented on PARQUET-792:
-------------------------------------
The main problem for you that causes the additional storage in comparison to
the 2-3 bytes I mentioned is that the NULL values don't occur on the same
level. As you are not interested in this information, you probably could tweak
the storage size by assuring that if a column is all-null that for all rows,
the null value occurs on the same (nesting) level.
> Skip the storage of repetition level and definition level for all-null column
> -----------------------------------------------------------------------------
>
> Key: PARQUET-792
> URL: https://issues.apache.org/jira/browse/PARQUET-792
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Li
> Priority: Minor
>
> I have a very sparse protobuf message in my project, with thousands of fields.
> In practise, most of the fields are all null values in one page.
> But the repetition level and definition level takes lots of storage space.
> Can parquet skip the storage of r level and d level for such all-null columns
> to save storage space?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)