[ 
https://issues.apache.org/jira/browse/PARQUET-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725264#comment-15725264
 ] 

Uwe L. Korn commented on PARQUET-792:
-------------------------------------

Repetition and definition levels should not take more than 2-3 bytes for a page 
if all fields are null (on the same level). 

How did you come to the conclusion that the repetition and definition levels 
take up so much space?

To get more insights into your files, you could use the new {{parquet-cli}} 
tool to inspect the sizes. See this PR for the new tool:
https://github.com/apache/parquet-mr/pull/384 

> Skip the storage of repetition level and definition level for all-null column
> -----------------------------------------------------------------------------
>
>                 Key: PARQUET-792
>                 URL: https://issues.apache.org/jira/browse/PARQUET-792
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Li
>            Priority: Minor
>
> I have a very sparse protobuf message in my project, with thousands of fields.
> In practise, most of the fields are all null values in one page.
> But the repetition level and definition level takes lots of storage space.
> Can parquet skip the storage of r level and d level for such all-null columns 
> to save storage space?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to