[jira] [Comment Edited] (PARQUET-1059) Improve the RLE encoding for Parquet Dictionary IDs

Dapeng Sun (JIRA) Fri, 14 Jul 2017 02:16:23 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087069#comment-16087069
 ]


Dapeng Sun edited comment on PARQUET-1059 at 7/14/17 9:15 AM:
--------------------------------------------------------------

Hi [~xhochy], 
{quote}
Can you describe a workload where this would bring a significant difference? 
{quote}
In my case, the values of column may be incremental or decreasing, but the 
change of the adjoining values is very small, so the dictionary IDs may also be 
adjoining or near. If the IDs are encoding with Delta, I think it would save 
more disk space.



was (Author: dapengsun):
Hi [~xhochy], 
{quote}
Can you describe a workload where this would bring a significant difference? 
{quote}
In my case, the value column may be incremental or decreasing, but the change 
of the adjoining values is very small, so the dictionary IDs may also be 
adjoining or near. If the IDs encoding support Delta, I think it would save 
more disk space.


> Improve the RLE encoding for Parquet Dictionary IDs
> ---------------------------------------------------
>
>                 Key: PARQUET-1059
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1059
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Dapeng Sun
>
> The IDs of Parquet Dictionary encoding is using 
> {{RunLengthBitPackingHybridEncoder}}.
> RunLengthBitPackingHybridEncoder handles encoding with {{repeat}} and 
> {{bitpacking}}, we should improve it with the method likes 
> {{DeltaBinaryPackingWriter}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (PARQUET-1059) Improve the RLE encoding for Parquet Dictionary IDs

Reply via email to