[ 
https://issues.apache.org/jira/browse/PARQUET-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087069#comment-16087069
 ] 

Dapeng Sun commented on PARQUET-1059:
-------------------------------------

Hi [~xhochy], 
{quote}
Can you describe a workload where this would bring a significant difference? 
{quote}
In my case, the value column may be incremental or decreasing, but the change 
of the adjoining values is very small, so the dictionary IDs may also be 
adjoining or near. If the IDs encoding support Delta, I think it would save 
more disk space.


> Improve the RLE encoding for Parquet Dictionary IDs
> ---------------------------------------------------
>
>                 Key: PARQUET-1059
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1059
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Dapeng Sun
>
> The IDs of Parquet Dictionary encoding is using 
> {{RunLengthBitPackingHybridEncoder}}.
> RunLengthBitPackingHybridEncoder handles encoding with {{repeat}} and 
> {{bitpacking}}, we should improve it with the method likes 
> {{DeltaBinaryPackingWriter}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to